System Evolution through Chaos Engineering

Introducing “Chaos Toolkit Playgrounds”, a new place for the community to explore Chaos Engineering in action

Grant Tarrant-Fisher
Chaos Toolkit
5 min readMay 30, 2019

--

It’s day 1 of chaos engineering for you and your team. You’re excited, you love the power of chaos engineering but … you’re nervous.

You’ve just completed reading “Chaos Engineering”, and maybe you’ve grabbed the Early Release of “Learning Chaos Engineering” and you’re looking for a place to start. But you’re wary, you do not want to be one of those internet case stories where your first experience of chaos engineering was causing total system outage… So what do you do?

Wouldn’t it be great if you could explore how a system can evolve in response to chaos engineering before you embark on chaos against your own systems? If you could only see what experiments others write, and how they work with those carefully so that those experiments become forcing factors for system improvement, rather than causes of cataclysm. Finally what if you could see how a system evolves on the basis of chaos engineering, how different solutions are explored, and how new evidence of system weaknesses are surfaced continually from the target system.

And what if you could do all of that in a safe way.

The Chaos Toolkit Community Playground

While writing Learning Chaos Engineering, my colleague Russ Miles was very aware of the jump from learning the discipline and practices of chaos engineering into the real world. While chatting about this over a coffee another colleague, Sylvain mentioned that he’d love to engage the community in a sequence of Hackathon style sessions, even though the Chaos Toolkit community could not be more globally distributed.

These ideas grew into the concept for the Chaos Toolkit Community Playground project. This project is unique as it isn’t a customization for the Chaos Toolkit, although it may contain many customizations over time. Instead, it is a project that contains a number of target systems that will evolve, publicly, as chaos engineering is applied to them.

The Chaos Toolkit Community Playground’s goals are to provide a set of systems, or playgrounds, with each at various points in their evolution, from naive first implementation to multi-cloud production deployment, where the community can grab, run and explore those systems using chaos engineering experiments collaboratively.

This means the Playground is a great place for you to grab the code, boot a system and start exploring automated chaos engineering experiments immediately. You’ll be able to see and learn from experiments others in the community have written and shared. You’ll see how chaos engineering can be hooked into system operations for concerns such as observability and control.

In a nutshell, the Playground will be a place for everyone to safely learn how to apply chaos engineering in the real world, without the dangers of immediately jumping to their own systems.

Oh and it will be a fun place too to learn, because of the power of the repository’s history…

Learning from a History of System Evolution

The Chaos Toolkit Community Playground is a public repository on GitHub, and that gives us a great platform for sharing not only what experiments and techniques are being used against the various target playground systems right now, but also what things were like in the history of those systems as well!

Each of the playground will evolve over time, and at each point that the community deems is important the history will be tagged. This means you can learn so much more about the power of chaos engineering as you can look at how a system evolved over time through chaos engineering as a forcing-factor of system improvements.

The Current Playground Systems

The Chaos Toolkit Community Playground can be found here:

The homepage introduced the playground’s goals and provides an index to its rapidly evolving set of playground projects and sample systems.

One of the projects in the Community Playground is the Yummy Noodle Bar Menu & Order Service. The yummy-noodle-sample is based around the Yummy Noodle Bar, which is great noodle bar located in the same building as our office in Eastbourne (handy for lunch), so we thought we would create a demo app around them, for the real deal you can see their website here. This sample system currently has a very simple architecture and is summed up in the following diagram:

Yummy Noodle Demo Architecture

The Yummy Noodle demo is one of our more recent demos and it’s designed to demonstrate the journey of applying some turbulent conditions to a naive implementation and as a result of that turbulence to make some improvements to the system so it survives the turbulent conditions. At this point in time, all we have is a system, the system meets two feature use cases:

  • Customers who want to place orders
  • Kitchen staff who will process and complete orders

To explain the architecture in a bit more detail, the Menu Service is a simple Web service implemented using Python, that provides a route to menu items from a static JSON file and will give a response of an array of JSON menu items.

The Web client service is a static web service that exposes two pages, both are rendered using the excellent Datatables.net, as it gives a lot of functionality for rendering data table’s out of the box and takes care of the Ajax interaction with the backend Menu service. From the menu service page, a user can pop-up an order pop-up, from the order pop-up they can select their order from the menu items and submit the order. The order gets pushed to a Firebase real-time database, this will get reflected on any web client that has the orders page open. This uses the firebase on change listener. From the orders page, the kitchen can progress placed orders and mark them completed when done. Completed orders can also be deleted, as required.

This is the current, minimal feature implementation of the system. In the future, we are going to explore some weaknesses and apply some hypothesis to the system and create Chaos Toolkit experiments. Applying turbulence to the system will enable us to see how we can improve the resilience of the system throughout the chaos engineering journey. the architecture and features as a journey through chaos engineering

Contributions Welcome

The Chaos Toolkit Community Playground is truly that, a playground where the community can collaborate across organizational boundaries around the practice of chaos engineering. The Chaos Toolkit community is a safe and fun place to share your own ideas and contributions, and right now is a great time to get involved by raising new issues, new PRs, for new chaos experiments, new integrations, or even perhaps whole new Playground systems of your own making!

This is your playground, and we hope you find it super useful as you take your first steps from learning chaos engineering into applying it in the real world.

--

--