Starting with Turbulence

How to get started with automated chaos when you’re not sure of the hypothesis (yet)

Russ Miles
Chaos Toolkit
4 min readMay 30, 2019

--

Getting started with Chaos Engineering can appear daunting at first. Even the Chaos Toolkit, which was originally created to be as simple a journey as possible got those starting to explore the discipline, can seem to ask a lot of you just to get going.

At first glance you need to do the following to get started:

  1. Figure out a steady-state hypothesis, including what probes and tolerances you want to include.
  2. Figure out what probes and actions you want in your method.
  3. Figure out what rollback activities, if any, you want to include to be a good citizen.

And then all of this needs to be captured in the JSON or YAML experiment format that the Chaos Toolkit supports. While all of these sections are important to a full chaos experiment, it would be great sometimes to get started quicker by having less to think about up front…

Science often doesn’t (really) start with a Hypothesis, and neither does experimental chaos engineering

Chaos engineering is a scientific, empirical discipline. At its heart is the concept of a chaos engineering experiment whose job is to surface evidence of system weaknesses so that those weaknesses can be analysed, prioritised, then accepted or overcome.

Experiments typically start with a hypothesis, which should be a collection of statements of belief that can be proven or disproven empirically. But building an effective hypothesis can be a real challenge, when you are trying to explore how a system responds to some turbulent conditions. Figuring out what your system’s steady-state hypothesis should be, or whether it can even be measured through automation yet, can even be a blocker.

If all you want to do is simply automate the turbulence injection steps (perhaps from a manual Game Day) as a first step towards full chaos engineering experiments then having to construct a hypothesis first can feel inappropriate. In fact, a lot of science doesn’t start with a hypothesis, in simplistic terms it starts with poking something and being surprised by how it responds.

At that early point there is no hypothesis, just an understanding of the things you want to do to the system to set up (and possibly tear down) the turbulent conditions that you’re interested in. In other words you know what you want to poke, but not yet what effects that might have as your hypothesis is nebulous at that early stage. The need to come up with a steady-state hypothesis right away can feel like an unnecessary burden.

From 3 to 1: Turbulence-Only in an Experiment with only a Method

There’s good news though if you’re automating those turbulent condition injection steps using the Chaos Toolkit. Although when you first look at a sample Chaos Toolkit experiment it might seem that there needs to be a steady-state hypothesis, and a method, and perhaps even a section for rollbacks, there’s a little-known feature of the experiment format that facilitates the ability to start with only one of those three aspects.

That feature is that the steady-state hypothesis is optional.

You can, and many users do, build a perfectly valid chaos engineering experiment that does not have a steady-state hypothesis at first. Let’s look at an example experiment from the Chaos Toolkit Community Playground:

This experiment is used in the forthcoming Learning Chaos Engineering book from O’Reilly to show how you can explore a system for evidence of multi-level (platform and people/process/practices) system weaknesses. When presented as it is above, it looks like the entire experiment came into being fully-formed immediately, but that really isn’t the case.

In fact the experiment started out much simpler as part of some automation for a Game Day. In its first incarnation the experiment was only a method with some description and that was enough to get the ball rolling:

Even though this experiment is simplified to just it’s method, it is still a valid Chaos Toolkit automated chaos experiment, you can see that by executing the chaos validate command:

When the Game Day was executed the system was manually inspected to see how it responded to the experiment method’s turbulent conditions, and the team collaborated to interpret these results. Next relevant rollbacks can be introduced:

Then eventually a steady-state hypothesis, with corresponding probes and tolerances, was able to be created to measure the important business metrics to surface the evidence that had been seen manually when using the method-only experiment in the Game Day.

Summary

Coming up a a complete steady-state hypothesis can seem onerous when you just want to “poke” a system with some turbulent conditions to see how ti responds. Frequently this is all the automation a manual Game Day needs, or in fact is all that can be automated when you are creating a chaos experiment.

The Chaos Toolkit supports this workflow by making the steady-state hypothesis optional. You can start with just the activities in your experiment’s method that inject the turbulent conditions you want to explore, and then over time you can complete your experiment with a steady-state hypothesis.

--

--

Russ Miles
Chaos Toolkit

People, Team and Organizational Developer. Writer, psychologist, speaker and humanistic Head of Engineering. https://twitter.com/russmiles