The Real Lesson from the AWS Outage
The embarrassing outage of Amazon Web Services this week should open our eyes to a growing problem. Complex systems are difficult to manage, but if they are connected in dependent ways, a fragile result emerges. Such structures are subject to unexpected malfunctions which can sprawl quickly. One of the most knowledgeable technology companies on the planet learned just such a lesson this week. Amazon’s star-child, their cloud services, had a major disruption. It was not a nation-state attack, sophisticated teams of cyber-hackers, or even malicious insiders bent on destruction. Nonetheless, the lessons are telling. The ramifications of which will be important to all of us.
It was one employee, typing a few wrong codes, that caused a significant outage to major portions of the Internet. Amazon worked furiously to contain and recover from the incident. It will have to rebuild trust with customers whom were sold on the resiliency of ‘cloud’ services to avoid such events. Amazon has already stated they will learn from the event and will apply some compartmentalization controls to lessen potential damage in the future. But there is a more significant realization to be made.
The greater lesson for us all is that when hugely sophisticated systems interconnect with each other, there is an exponential increase in complexity. Due to reliance, authority, and trust, these structures can fail in spectacular fashion. The AWS example show how such a situation allows a series of cascading unintended effects, that cannot easily have been predicted, to occur and cause widespread impacts. As bad as it may have appeared, it was not too severe. If it were an intentional attack from a capable, motivated, and sophisticated attacker, I believe the results would have been catastrophic.
With the AWS outage we can see the impact of an unintentional accident and the difficulty to recover when everyone is working together to resolve the issue. Now imagine what a malicious and focused cyber-threat could do while being stealthy, striving for maximum damage, and actively undermining countermeasures and recovery actions of response teams.
If this were a malicious insider or professional hack, the damage would be a thousand times worse. We would still be picking up the shattered pieces. There would be tears falling from the AWS cloud.
This week it was cloud storage services making websites unavailable. What happens when it is a fleet of autonomous vehicles which put lives at risk or the complex national power grid infrastructure?
We must take a fresh look at understanding threats, risks, countermeasures, and protection practices as individual pieces of the computing world are growing much more complex and being connected. Traditional methods are not sufficient in understanding how chain reactions can occur in the next generation of new technologies and services.