Tools & Tech : Chaos Engineering
Welcome to the first installment of our new monthly series- Tools & Tech. Each month we’ll dive into technologies and tools that our Froggers are leveraging on their projects. Is there something cool you’ve been wanting to try out? Shoot us a note or leave a comment.
First up- Chaos Engineering. Maybe you’ve heard of it, maybe not, but we can guarantee you’re familiar with the company known for coining the practice. The engineers at our beloved Netflix define Chaos Engineering as:
…The discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
So what does that mean, exactly? In simple terms it means to purposefully break things to see what would happen to a system when it’s in production and faces disastrous conditions. First one must define the “steady state” of the system; that is to say how it acts when running normally. To test how reliable the system is, you can then introduce real-life variables (network failures, crashing servers, etc.) to an experimental group. Compare the differences between the stable control group and the experimental group. The harder it is to disrupt the “steady state” of the experimental group, the more confidence you can have in the system.
Sounds pretty simple. But wait… how do you introduce these variables? How do you think creatively to try and wreak havoc your own system? Well those geniuses over at Netflix thought of this too. Enter the Simian Army. This collection of testing tools, such as Chaos Monkey and Conformity Monkey, allows the engineer to simulate chaotic instances and detect ways the system could be prone to problems. This allows the engineer to fix and enhance the system, thereby increase the odds of continuous service even when problems arise in real life.
*Note that LeapFrog Systems is not formally affiliated with any tool, technology, or process.*