Rant Azure (and Their Support) is Garbage

StrangeWill · Jun 16, 2024

Twice, twice this week I've had a primary PostgresSQL database just blip offline for hours on Azure.

Connections:

CPU/RAM:

Database shows online though (this is a massive lie):

Under both circumstances, our entire app goes down, it takes hours to come back up, we immediately contact support. Microsoft has this amazing thing, first to hit their 1 hour SLA, they'll have a tech call you and tell you they're elevating it, after that, expect to way 3-7 hours for a response from someone that isn't just reading documentation at you, during that time you've basically wondering if your app will ever come up because they don't provide you with updates.

About 3 hours into our downtime the database finally reports down:

Thanks Microshit.

BTW get used to this message, rebooting the instance doesn't fix it, it just gets stuck in a rebooting state, we've had this issue happen 4 times in the past 2 months,

My favorite part was getting an e-mail from an engineering that read "[FirstName] [Lastname] (MINDTREE LIMITED)"

Mindtree - Wikipedia

en.wikipedia.org

(they're LTIMindtree now though)

Microsoft is fucking outsourcing support for their cloud system? That explains why one of their engineers got upset that @Jim O Thy gave them a 4/5 instead of a 5/5 report, probably has to do with metrics from Microsoft -- I would have given them a fucking zero because this experience for the past 60 days has been a fucking nightmare.

I fucking hate getting a phone call from Microsoft to tell me to prepare for a Teams meeting in 30 minutes, one, just fucking send me a teams invite, secondly Teams is fucking garbage (along with Microsoft SSO in general), third, don't fucking call me to tell me nothing to hit your SLA targets, that's such bullshit (SLAs are garbage to begin with though but that's a whole other rant).

I did get to spend an hour arguing with a first level tech who was complaining that we dared to restart our server (the first time it hung), and was explaining that we were "overusing" our system, we were using 30% of our paid-for throughput, but because we'd sometimes hit 100% he blamed out downtime on that. I cursed him out to stop wasting my time with that bullshit and get me an engineer to actually fucking look at the server (the problem wasn't caused by any of that but a failure to come online due to a >30TB log file).

But Will, enable HA

Okay, one, we're doing that, but two that's doubling our bill on our most expensive server on an already tight budget -- on top of that the single node server HAS AN SLA of 99.95% which they're missing massively -- AND on top of that, some of the causes reported -- HA may have made WORSE

What the hell are you doing?!

Abusing a PostgreSQL server nearly to death, yeah I know it's less than ideal, but this shaky-ass foundation isn't great.

That's what you get when you rent...

Yeah, yeah, I know, not my call, I've proposed we run our own but the customer immediately started concern trolling me so I bailed on that path.

Rant Azure (and Their Support) is Garbage

StrangeWill

Administrator

Mindtree - Wikipedia