The Chaos Toolkit community project is one year old!

A community project is born

Sylvain Hellegouarch
Chaos Toolkit
4 min readOct 8, 2018

--

Automation for your Chaos Engineering

About a year ago, Russ Miles and I had the crazy idea we could provide a simple interface to help the uptake Chaos Engineering for everyone while being true to its core principles. Our quick review of the open-source tools we could find led us to believe we could benefit from putting the concept of the experiment itself front and center.

What are we talking about here?

Let’s refresh your memory if you aren’t familiar with the context:

  1. Chaos Engineering is a discipline pioneered by Netflix, but applied in various forms at other companies, to stress parts of your system in order to discover how it behaves when conditions are turbulent. The goal is to support teams in becoming proactive in discovering, exploring and potentially overcoming system weaknesses before they are experienced by the system’s users. O’Reilly has an excellent free e-book to get you up to speed on the topic and which was a great foundation for our early exploration.
  2. A chaos engineering experiment defines a scenario and allows you collect data that will support the weakness discovery and analysis process.
  3. The Chaos Toolkit is an open-source project that facilitates the declaration and automation of such an experiment. You declare your experiment in a JSON/YAML file, using drivers to cause the turbulent conditions as well as probe your system. The generated report provides a mean to analyse what went on during the execution.

In effect, the Chaos Toolkit is an implementation of the Chaos Engineering approach that then lets you decided how to stress your system since, in our opinion, your system context, and chaos engineering needs, are unique.

A hell of a ride!

Russ and I started the project and focused on getting the core engine as simple as possible (we strongly believe in simplicity in software design) and we quickly put our focus on Kubernetes as our first target. But, before too long, we started other chaos drivers ranging from infrastructure (AWS, Azure, Pivotal Cloud Foundry or Google Cloud), through the platform and even to the application level (Spring). We fully expect to grow this list as the community grows too.

Open Source is at the core of our activities and so early on we decided to reach out to an organization where we could build with the community in a positive way. We therefore contacted the CNCF who quickly understood the need for Chaos Engineering and planted the seed for a future Working Group on the topic. I was fortunate enough to introduce that effort at KubeCon Europe in Spring 2018 as well as be present at KubeCon North America for a follow-up in December 2018. Thanks so much the CNCF for those opportunities!

Meanwhile, Russ introduced the Chaos Toolkit to happy souls at various Chaos Engineering or Cloud Native trainings he’s given this year. Not to mention his tour across the USA as part of the Geek on a Harley: Chaos Tour! Lastly, as 2018 moves towards a close we are planning to present our new product, Chaos Platform, at muCon 2018 in London in November.

Our Community is Awesome and Defines the Project

While we are proud of what we have achieved, we want to thank the awesome community which has formed around the Chaos Toolkit. First, people are friendly and supportive. Personally this is a huge factor in the success of open source communities that we enjoy being part of, people are civil and kind to each other.

Then, of course, I admire our community because they don’t demand, they suggest and, more often than not, they contribute to the project. This is a big piece of validation for everyone and it shows that our design is, indeed, simple enough but also that folks rely on the project effectively today!

Some drivers have been led entirely by our community. For instance the AWS one but also the ToxiProxy driver (to trigger fault at the network level of your application) are fully community-driven. New drivers are being proposed too.

The rate of contribution has increased rapidly over the summer and we now have changes propagated to drivers on a weekly basis. Contributors are doing an amazing job and they respect the style of the projects perfectly, which is something we all care about.

Today, this anniversary is theirs as much as it is ours and we are delighted to have such a great crowd! Please join us if you fancy working on something cool with friendly people.

What does the future look like?

Chaos Toolkit is one year old which means it’s just starting its life. What matters now is that the project matures enough that its core reaches a 1.0 that people can rely on even more than they already do. Then move the discussion on the governance of the project.

Oh and let’s not forget, we’re going to get some swag (stickers, t-shirts…) by popular demand. We love our logo and we want to thank Marc Perrien who created it!

On the ChaosIQ side, we will continue providing commercial support where needed but, more importantly, we will continue contributing to the open-source project itself. We are already working hard to expand this OSS universe with the Chaos Hub, a control plane for your own Chaos Engineering, and engaging with companies that are looking for commercial open source support around the entire Chaos Platform.

Today it’s time to celebrate this first great year of the Chaos Toolkit, with an eye towards all the great ideas we have for features in the Toolkit, Hub and Commercial Platform in the coming year!

--

--