Who is your Chaos monkey?

“How can you think yourself a great man, when the first accident that comes along can wipe you out completely.” — Euripides

In innovate and even survive for the long run the key is to move beyond resilience. To be able to manage uncertainty, to work with ambiguity and in fact going one step forward — using volatility and chaos to actually get better. How is that possible?

That is the concept of Anti-Fragility. It comes by Nassim Taleb.

An antifragile way of life is all about finding a way to gain from the inevitable disorder of life. To not only bounce back when things don’t go as planned, but to get stronger, smarter, and better at continuing as a result of running into this disorder.

How can we do that? For today, lets look at one example from Netflix.

Netflix says that the cloud is all about redudancy and fault-tolerance. How do we design for that? How would we know what fault-tolerance is — especially when the once in a blue moon events don’t come often.

Imagine getting a flat tire. Even if you have a spare tire in your trunk, do you know if it is inflated? Do you have the tools to change it? And, most importantly, do you remember how to do it right? One way to make sure you can deal with a flat tire on the freeway, in the rain, in the middle of the night is to poke a hole in your tire once a week in your driveway on a Sunday afternoon and go through the drill of replacing it. This is expensive and time-consuming in the real world, but can be (almost) free and automated in the cloud.
This was our philosophy when we built Chaos Monkey, a tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption. By running Chaos Monkey in the middle of a business day, in a carefully monitored environment with engineers standing by to address any problems, we can still learn the lessons about the weaknesses of our system, and build automatic recovery mechanisms to deal with them. So next time an instance fails at 3 am on a Sunday, we won’t even notice.

This is just brilliant. To create something to cause failure to your own systems and in the process learn, adapt and prepare for the one event that can mean catastrophic failure.

The recent blackout of the entire state of South Australia is a good example of a system built for resilience and not anti-fragility.

Organisations are the same.

What is your chaos monkey?

Business Models Inc is an international strategy & design firm that helps corporates, government, NFPs, entrepreneurs and start-ups innovate their business models and design strategies for the future through co-creation and visualization.

Suhit is a Partner & Strategy Designer at Business Models Inc. He is focussed on impact innovation. He combines design, business, social sciences, entrepreneurship with a focus on social innovation and social entrepreneurship. He works with organisations to create value for customers, employees, stakeholders and the society. He calls it “Humanomics”.
You can learn more here — 

Chaos Monkey