So a thread to tell your woes of BFCM -- the day all of our e-commerce sites (and related services) get kicked in the teeth day after day.
So mine started actually on the 27th, we get a massive traffic spike after noon system is holding fine, some events are backing up in RabbitMQ, but that's it's job it'll be fine.
Then RabbitMQ's hard drive fills up and all hell breaks lose. Yeah we shoved 64GB of overflow messages into the thing in under 2 hours -- annoying but an easy fix. Once done everything has been relatively happy.
At one point we were at 36 million items in RabbitMQ ready for pickup (I only got it at 30m)
Some of our jobs were doing this bullshit:
I had been fine tuning our workers to try to get them consistently processing, I've pushed 17 releases to production since Wednesday
it's hard to explain how the queue may be able to process a bajillion messages, but that doesn't matter if the consumers are bottlenecked due to a not-so-great query.
Meanwhile CEO was showing off how many people were having meltdowns about other platforms blowing up. Funny enough ours blew up the day prior so we got ahead of all the traffic and it was a quick enough fix that no one actually complained.
So mine started actually on the 27th, we get a massive traffic spike after noon system is holding fine, some events are backing up in RabbitMQ, but that's it's job it'll be fine.
Then RabbitMQ's hard drive fills up and all hell breaks lose. Yeah we shoved 64GB of overflow messages into the thing in under 2 hours -- annoying but an easy fix. Once done everything has been relatively happy.
At one point we were at 36 million items in RabbitMQ ready for pickup (I only got it at 30m)
Some of our jobs were doing this bullshit:
I had been fine tuning our workers to try to get them consistently processing, I've pushed 17 releases to production since Wednesday
it's hard to explain how the queue may be able to process a bajillion messages, but that doesn't matter if the consumers are bottlenecked due to a not-so-great query.
Meanwhile CEO was showing off how many people were having meltdowns about other platforms blowing up. Funny enough ours blew up the day prior so we got ahead of all the traffic and it was a quick enough fix that no one actually complained.