Digital Ocean needs to start showing outages on their status page
I recently signed up with Digital Ocean on their $5/month plan to rent an instance (aka droplet) for the purpose of hosting a couple web sites. I’ve heard good things about them and the price and specs were right.
For the past two weeks I’ve been rather pleased with the product. It has a great UI and my instance seemed to be performing just fine, that is, until this morning.
Today I woke up to discover that my sites were down and I couldn’t ssh into my instance. I first assumed I must’ve done something wrong to my instance the night before, so I went to restart it via the web portal, but the restart request just stayed there, spinning.
The next step I took was to check out Digital Ocean’s status page. Whenever I have an issue with any other service, such as github, I check their status page since it tells me if there’s a problem going on with them, and I should just wait patiently. But alas, Digital Ocean’s status page said all was good.
So I filed a support ticket. Unsurprisingly, it turns out there’s a problem with my “physical node” and they’re working to fix it. In other words, the real machine my virtual instance was running on has issues.
Ok then, while they’re fixing it, and my web app is having some serious downtime (it’s been a few hours, and it’s still down) I asked if they could update their status page, perhaps to inform other customers of the issue.
This is the response I got:
This appears to be an issue with the hypervisor that your droplet is hosted on, so we will not be updating our status page for this droplet.
I will escalate this to my engineering team to resolve for you.
I take that to mean that if the problem only affects a few customers (the number affected isn’t given) then they don’t update their status page. I think this is a mistake.
IMO a status page should be a public record of all the times your service has experienced a catastrophic failure, even for a small number of customers, if not also small hiccups like packets loss or lag. It tells your customers that you’re serious about reliability and performance. It allows potential customers to be able to judge you fairly, and see for themselves if your claims to reliability and performance are true. It also saves affected customers time since they don’t need to open support tickets if they see that the service is experiencing issues. If the service still wants those tickets opened, they can mention that on the status page.
I recommend that Digital Ocean changes their policy in this regard by opening up and letting everyone know that they’re serious about reliability.
Update: Yay, my droplet is back online! Unfortunately, their status page still doesn’t record any issues.