Disappointing Azure Cloud Failures
I have been a big fan of the Microsoft Azure Cloud. Here at Falafel we use it on dozens of projects for our valued customers and internally as well, but unfortunately Sunday Feb 19th 2017 was a bad day for the Azure cloud if your services, VMs, jobs, IoT and other services were running in their “West US 2” region.
One of our products, EventsXD experienced total failure around 5:00 am PST on Sunday while multiple conferences around the world were using the platform for their events. Immediately, we were notified by our systems that connections to the Hosting servers and SQL Servers in Azure were down.
Azure Cloud Failure
I opened a critical support ticket with Microsoft around 6:00 am PST, gladly received a reply at 6:15 am (very fast) that Engineering is looking into it.
At this time, we started received emails and calls from angry customers that their event is down and none of the attendees or the organizers can access the event on their mobile device.
It took 6.5 hours for Azure to restore the services on “West US 2”, by that time several clients notified us that they would like a refund for their events and I am pretty sure they will not be using EventsXD again in the near future.
Lack of Redundancy
EventsXD has over 10,000 events running on its platform worldwide. We thought we were in good hands with Microsoft Azure with adequate “Redundancy” and solid infrastructure.
Even though the support was excellent as far as timing, they did not provide ANY valuable information regarding how could that possibly happen in a Cloud platform that drives itself on “Redundancy” and solid infrastructure.
To say the least, we are very concerned about our investment and reliance on the Azure Cloud after this instance. This is the 3rd time in 18 months that Azure fails and causes catastrophic results to critical apps running on the platform.