Chaos Engineering with Chaos Mesh

Happy devSecOps

(λx.x)eranga
Effectz.AI
5 min readNov 8, 2021

--

Background

In this post I’m gonna discuss about using Chaos Mesh which is an open-source Chaos Engineering framework on Kubernetes. All the deployments which related to this post available on gitlab. Please clone the repo and continue the post.

Chaos Engineering

Chaos Engineering involves running thoughtful, planned experiments that teach us how our systems behave in the face of failure. It is a disciplined approach to identifying failures before they become outages. By proactively testing how a system responds under stress, you can identify and fix failures before they end up in the news. In Chaos Engineering you start by forming a hypothesis about how a system should behave when something goes wrong. Then, you design the smallest possible experiment to test it in your system. Finally, you measure the impact of the failure at each step, looking for signs of success or failure. When the experiment is over, you have a better understanding of your system’s real-world behavior.

Chaos Engineering first started in Netflix 2012. Netflix invented Chaos Monkey, which injects various types of faults into the infrastructure and business systems. This is how Chaos Engineering began. Chaos Monkey was created in response to Netflix’s move from physical infrastructure to cloud infrastructure provided by Amazon Web Services, and the need to be sure that a loss of an Amazon instance wouldn’t affect the Netflix streaming experience. Chaos Monkey randomly terminates virtual machine instances and containers that run inside of your production environment. With Chaos Monkey, engineers quickly come to learn if the services they’re building are robust and resilient enough to tolerate unplanned failures.

Chaos Mesh

Chaos Mesh is an open-source, cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. Chaos Mesh includes fault injection methods for complex systems on Kubernetes and covers faults in Pods, the network, the file system, and even the kernel. It can performs chaos experiments in production environments without modifying the deployment logic of the application. Easily orchestrate the behavior of chaos experiments, allowing users to observe the state of the experiment itself in real time and quickly rollback any injected failures. Chaos Mesh provides visualization dashboard which supports to design the Chaos scenarios on the Web UI interface and monitor the status of Chaos experiments.

Chaos Mesh is built on Kubernetes CRD (Custom Resource Definition). To manage different Chaos experiments, Chaos Mesh defines multiple CRD types. Following are the currently supporting CRDs. These CRDs mainly categorized into three main fault types, basic resource faults, platform faults, and application-layer faults.

Install Chaos Mesh

In this experiment I have used Minikube based Kubernetes cluster on my dev environment. Following is the way to install Chaos Mesh in the test(dev) environment. It will creates a Kubernetes namespace chaos-testing. The Chaos Mesh dashboard run on port 2333 and expose via NodePort service. We can access the dashboard with NodePort port 2333 mapping port.

Chaos Mesh Experiments

There are two ways to run Chaos Mesh experiments. One method is directly creating experiments via Chaos Mesh web dashboard. Other method is .yaml file deployments. In this scenario have run four Chaos Mesh experiments pod-failure, pod-kill, stress, http-abort. There are various other types of experiments available with Chaos Mesh. The Chaos Mesh documentation contains full set of information about those experiments. Also they have several interactive tutorials. The experiments have been run against a three node nginx cluster. Following is the nginx cluster deployment.

Pod Failure Experiment

PodChaos experiment can simulate fault scenarios of the specified Pods or containers. Currently, PodChaos supports pod-failure, pod-kill, container-kill fault types. In following experiment Chaos Mesh injects pod-failure into the specified nginx Pod and makes the Pod unavailable for 30 seconds. Pod selection happens through the labelSelectors flag. Following is the specification of Pod Failure experiment.

The experiment can be deploy via kubectl apply -f chaos-pod-failure.yaml command. Once deployed the experiment, we can view the status of the experiment from Chaos Mesh web dashboard.

Pod Kill Experiment

In following experiment Chaos Mesh injects pod-kill into the specified nginx Pod and kill the Pod. Following is the specification for Pod Kill experiment.

The experiment can be deploy via kubectl apply -f chaos-pod-kill.yaml command. Once deployed the experiment we can view the status of the experiment from Chaos Mesh web dashboard.

Stress Experiment

Chaos Mesh provides StressChaos experiments to simulate stress scenarios inside containers. The following experiment will create a process in the selected container(nginx), continuously allocate and read and write in memory, occupying up to 256MB of memory.

The experiment can be deploy via kubectl apply -f chaos-stress.yaml command. Following is the result of the experiment on Chaos Mesh web dashboard.

Http Abort Experiment

HTTPChaos can simulate the fault scenarios of the HTTP server during the HTTP request and response processing. Currently, HTTPChaos supports simulating the four types 1) abort, 2) delay, 3)replace and 4) patch. In this experiment every 10 minutes, Chaos Mesh will inject the abort fault into the specified pod for 5 minutes. During the fault injection, the GET requests sent through port 80 in the /api path of the target Pod will be interrupted.

The experiment can be deploy via kubectl apply -f chaos-http-abort.yaml command. Then Chaos Mesh web dashboard will list the status of the experiment.

Reference

  1. https://www.gremlin.com/community/tutorials/chaos-engineering-the-history-principles-and-practice/
  2. https://pingcap.com/blog/chaos-practice-in-tidb
  3. https://dzone.com/articles/run-your-first-chaos-experiment-in-10-minutes
  4. https://medium.com/nerd-for-tech/chaos-engineering-in-kubernetes-using-chaos-mesh-431c1587ef0a
  5. https://itnext.io/getting-started-with-chaos-mesh-and-kubernetes-bfd98d25d481
  6. https://medium.com/rahasak/replace-docker-desktop-with-minikube-and-hyperkit-on-macos-783ce4fb39e3

--

--