Amazon S3 Outage shows the danger of doing things cheaply.

For a 4 hour period the Amazon Northern Virginia data centre for S3 went offline.

Yet from an IT systems point of view, how come so many companies were crippled by the incident?

Eggs in one basket approach

The Cloud is not a magical fairy land where nothing ever goes wrong. Amazon is the biggest and boldest cloud company on the planet and even they have outages.

Now, Amazon have an additional 13 data centre locations around the world which remained online during the outage but many businesses had placed their data solely in one physical location.

In simple terms. Amazon S3 is a storage location. Many businesses and services place their data here and although you never see it directly, you talk to it via a website, an app or other service.

What happened on Tuesday was a single S3 location went down in a spectacular fashion and many public websites suffered issues.

It is very easy to point the finger at Amazon but if you have a business which provides a service to customers, why would you be crazy enough to put all of your eggs in a single basket?

The answer is cost.

An extremely common approach is to have two locations for accessing the data. So if one location goes down, the other is still operational.

But to do this, you need to quite literally double the budget, something many companies are often unwilling to do.

A ballpark figure for 50TB of storage (just to store, not transfer in and out) comes to $1200 per month.

So to have an exact copy of that data available would bring it up to $2400 a month. When you add the bandwidth fees on top, it can get pretty expensive.

Changing a mindset

Many of Amazons clients have used it to launch some really big apps over the years but a large proportion of them are designing their systems with high risk.

Amazon have given developers, DevOps teams and network specialists the tools to build extremely reliable systems on a platform which is the world leader for security, performance and reliability.

Yet if a large percentage of those customers only pay the bare minimum and opt for the cheapest/riskiest approach and then blame the provider when things do go wrong, what will that do to the cloud computing market?

Should Amazon now have to raise their price and build in extra protections as standard for every client? Not at all. It is up to the individual customer to use the service you have available to you.

Cloud computing is great IF you are doing these things

  • Vet the providers you choose carefully. If there is a provider offering a service at too good to be true pricing, ask yourself what corners they are cutting to achieve that?
  • Spread the risk by choosing different physical locations AND spread it between different providers. Why use Amazon solely for the app and data? Do a risk analysis to determine what period of downtime is acceptable.
  • Have money set aside to grow the infrastructure as the demands of the app/website/service increase.

If you are a customer of one of the affected websites, ask them quite pointedly why they have not built in extra safety measures into their product. Always remember that Amazon are a supplier, why only have one supplier for your customers?

Amazon give their customers the choice to have redundancy and added reliability but bizarrely, they ask you to pay for it.

Or you can live dangerously and have less safety but higher risk.

Some other outage stories you might like: