The Domino Effect: Why the cloud we rely upon isn’t robust?

We’re ushering into the era of the cloud at an astounding pace. The world we live and breathe in is slowly becoming a smaller place as means of communicating and connecting become that much easier.

With all major tech giants migrating their products and services to the cloud to offer a more synchronized and seamless experience for users, Cloud Computing is definitely a hot topic!

But for most of us it’s just the perfect “Girl next door” kinda thing.

Our world is an interconnected network of services and products that breathes and feeds on data. Cut off one line and probably you end up in a stroke!

Pun aside, let’s get to the point of this post.

Enter Facebook.

A few weeks ago, Facebook suffered 2 simultaneous outages. This was the first time since its inception. And to no-one’s surprise, people took to Twitter with the hashtag #facebookdown.

#ANewHashTagWasBorn -_-

This was the first time in almost 4 years that the site was down for more than 2 hours (2.5 to be precise).

“The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.”
-Facebook engineering team.

You can read Facebook’s explanation of the problem here.

While this may just look like one website going down, there’s more to it under the hood. Every web service or e-commerce that makes use of Facebook Login just to alleviate their users from the hassle of creating and remembering yet another password for the account, goes down with Facebook. And this is a huge problem.

Think of it this way. It’s not just Facebook that is losing big bucks every second it’s down, every web service that is reliant on Facebook be it for user authentication or for serving ads, lose revenue. Doing the math alone is enough to trigger an anxiety attack.

Here’s a graph which shows a rise in page load times of several websites which rely upon Facebook elements in some form, be it authentication or embedded posts.

Blink your eye, there goes another $1000

The real deal here is that it’s not just Facebook or Google or any other company that relies heavily on cloud. It’s the way the cloud operates, the flaw is unseen until lightning strikes. Speaking of lightning, a while back Google’s data centers in Belgium were hit by lightning, TWICE!! this caused a loss of 0.000001% data which may sum up to only a few gigabytes lost, but the lost data couldn’t be recovered.

Facebook outage stats from DownDetector.

Now here’s the thing, each day billion of users rely on the cloud for synchronization of their data, or maybe just to save a backup of important files but in case of a natural calamity, the loss cannot be reversed.

Having said that, it’s critical to analyze the fact that power grids which keep our servers and almost all critical businesses running, are one of our weak points .

Here’s a statement by the former US secretary of defense about how vulnerable the US power grid was to a terrorist attack.

“The possibility of a terrorist attack on the nation’s power grid — an assault that would cause coast-to-coast chaos,is a very real one.”

Time to backup your backups!

Power grids are the lifeline of all cloud-reliant businesses, which raises the necessity of a stable and reliable grid.

We need to embrace and adopt Smart Grids, which can intelligently respond to failures and minimize the damage.

We are in the process of building an Internet dependant world, while we’ve been successful in creating a network that carries traffic from one point to another, a power failure can still bring everything to a halt by taking down the routing systems.

We’re increasingly dependant on the cloud and as our dependency increases, we become more vulnerable to their loss and failures as well .

Tracing back the line, we find that it’s not just the cloud that is prone to such vulnerabilities and flaws, the Internet on a whole isn’t operating the way it was designed and intended to operate. We still fall back to those key physical locations where data and network interconnections are concentrated. This means that the failure of one of those could start a Domino Effect.

Take for example the DNS infrastructure, we just have 13 root servers across the globe which act as master lists for the entire web’s address book. A targeted attack to one of those or even a probable technical failure could have a huge socioeconomic impact.

Bottom line, the migration to cloud and the reliance on its effectiveness is still questionable until we build a more robust system. And this is not a problem that can only be tackled by computer scientists. It’s like a Swiss crafted machinery where each component needs to be crafted with utmost precision, so that they fit in and operate seamlessly.

Just like the great Steve Jobs said —

“Technology alone is not enough—it’s technology married with liberal arts, married with the humanities, that yields us the results that make our heart sing.”

On the technical front, we need smarter more reliable grids which can make sure that the data centers aren’t affected by power failures. We need more robust cloud infrastructure which can handle failures in a better way, minimizing data loss.

On the user front, awareness about the situation is just as important. The fact that just by signing-in to a device gives them access to data stored on a different device seems magical. It’s true that abstraction is good and how your data moves from one device to another is not something which most people want or need to know. Still, a knowledge about the fact that these systems may fail and have failed, and what’s the protocol for damage control can really come in handy.