Pumba — Chaos Testing for Docker

Alexei Ledenev
HackerNoon.com
6 min readApr 9, 2016

--

Updated on 28-July-2016: Pumba cli change; support network emulation.

Introduction

The best defense against unexpected failures is to build resilient services. Testing for resiliency enables teams to discover these failures before the customer notices. By intentionally causing failures as part of resiliency testing, you can enforce your policy for building resilient systems. Resilience of the system can be defined as its ability to continue functioning even if some components of the system are failing — ephemerality. The growing popularity of distributed and microservice architecture makes resilience testing critical for applications that now require 24x7x365 operation. Resilience testing is an approach where you intentionally inject different types of failures at the infrastructure level (VM, network, containers, and processes) and let the system try to recover from these unexpected failures that can happen in production. Simulating realistic failures at any time is the best way to enforce highly available and resilient systems.

What is Pumba?

First of all, Pumba (or Pumbaa) is a supporting character from Disney’s animated film The Lion King. In Swahili, pumbaa means “to be foolish, silly, weakminded, careless, negligent”. And this actually reflects the desired behavior of application, I’ve tried to build :-)

Pumba is inspired by highly popular Netfix Chaos Monkey resilience testing tool for AWS cloud. Pumba takes a similar approach, but applies it to container level. It connects to the Docker daemon running on some machine (local or remote) and brings some level of chaos to it: “randomly” killing, stopping and removing running containers.

If your system is designed to be resilient, it should be able to recover from such failures. “Failed” services should be restarted and lost connections should be recovered. This is not as trivial as it sounds. You need to design your services differently. Be aware that a service can fail (for whatever reason) or service it depends on can disappear at any point of time (but can reappear later). Expect the unexpected!

Why to run Pumba?

Failures happen and they inevitably happen when least desired. If your application cannot recover from system failures, you are going to face angry customers and maybe even lose them. If you want to be sure that your system is able to recover from unexpected failures, it would be better to take charge of them and inject failures yourself instead of waiting till they happen. This is not a one time effort. At age of Continuous Delivery, you need to be sure that every change to any one of system services, does not compromise system availability. That’s why you should practice continuous resilience testing. With Docker gaining popularity as people are deploying and running clusters of containers in production. Using a container orchestration network (e.g. Kubernetes, Swarm, CoreOS fleet), it’s possible to restart a “failed” container automatically. How can you be sure that restarted services and other system services can properly recover from failures? If you are not using container orchestration frameworks, life is even harder: you will need to handle container restarts by yourself.

This is where Pumba shines. You can run it on every Docker host, in your cluster, and Pumba will “randomly” stop running containers — matching specified name/s or name patterns. You can even specify the signal that will be sent to “kill” the container.

What Pumba can do?

Pumba can create different failures for your running Docker containers. Pumba can kill, stop or remove running containers. It can also pause all processes within running container for a specified period of time. Pumba can also do network emulation, simulating different network failures, like: delay, packet loss/corruption/reorder, bandwidth limits and more. Disclaimer: netem command is under development and only delay command is supported in Pumba v0.2.0.

You can pass a list of containers to Pumba or just write a regular expression to select matching containers. If you will not specify containers, Pumba will try to disturb all running containers. Use --random option, to randomly select only one target container from the provided list.

How to run Pumba?

There are two ways to run Pumba.

First, you can download Pumba application (single binary file) for your OS from the project release page and run pumba --help to see a list of supported commands and options.

Examples

Running Pumba in Docker Container

The second approach to run it in a Docker container.

In order to give Pumba access to the Docker daemon on the host machine, you will need to mount var/run/docker.sock unix socket.

Pumba will not kill its own container.

Next

I’ve just created the Pumba project and will gladly accept any ideas, Pull Requests, issues and other contributions to the project.

Pumba GitHub Repository

Originally published at blog.terranillius.com.

Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.

If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!

--

--