Chaos Engineering with Chaos Mesh
Happy devSecOps
Background
In this post I’m gonna discuss about using Chaos Mesh which is an open-source Chaos Engineering framework on Kubernetes. All the deployments which related to this post available on gitlab. Please clone the repo and continue the post.
Chaos Engineering
Chaos Engineering involves running thoughtful, planned experiments that teach us how our systems behave in the face of failure. It is a disciplined approach to identifying failures before they become outages. By proactively testing how a system responds under stress, you can identify and fix failures before they end up in the news. In Chaos Engineering you start by forming a hypothesis about how a system should behave when something goes wrong. Then, you design the smallest possible experiment to test it in your system. Finally, you measure the impact of the failure at each step, looking for signs of success or failure. When the experiment is over, you have a better understanding of your system’s real-world behavior.
Chaos Engineering first started in Netflix 2012. Netflix invented Chaos Monkey, which injects various types of faults into the infrastructure and business systems. This is how Chaos Engineering began. Chaos Monkey was created in response to Netflix’s move from physical infrastructure to cloud infrastructure provided by Amazon Web Services, and the need to be sure that a loss of an Amazon instance wouldn’t affect the Netflix streaming experience. Chaos Monkey randomly terminates virtual machine instances and containers that run inside of your production environment. With Chaos Monkey, engineers quickly come to learn if the services they’re building are robust and resilient enough to tolerate unplanned failures.
Chaos Mesh
Chaos Mesh is an open-source, cloud-native Chaos Engineering platform that orchestrates chaos in Kubernetes environments. Chaos Mesh includes fault injection methods for complex systems on Kubernetes and covers faults in Pods, the network, the file system, and even the kernel. It can performs chaos experiments in production environments without modifying the deployment logic of the application. Easily orchestrate the behavior of chaos experiments, allowing users to observe the state of the experiment itself in real time and quickly rollback any injected failures. Chaos Mesh provides visualization dashboard which supports to design the Chaos scenarios on the Web UI interface and monitor the status of Chaos experiments.
Chaos Mesh is built on Kubernetes CRD (Custom Resource Definition). To manage different Chaos experiments, Chaos Mesh defines multiple CRD types. Following are the currently supporting CRDs. These CRDs mainly categorized into three main fault types, basic resource faults
, platform faults
, and application-layer faults
.
Install Chaos Mesh
In this experiment I have used Minikube based Kubernetes cluster on my dev environment. Following is the way to install Chaos Mesh in the test(dev) environment. It will creates a Kubernetes namespace chaos-testing
. The Chaos Mesh dashboard run on port 2333
and expose via NodePort
service. We can access the dashboard with NodePort port 2333
mapping port.
Chaos Mesh Experiments
There are two ways to run Chaos Mesh experiments. One method is directly creating experiments via Chaos Mesh web dashboard. Other method is .yaml
file deployments. In this scenario have run four Chaos Mesh experiments pod-failure
, pod-kill
, stress
, http-abort
. There are various other types of experiments available with Chaos Mesh. The Chaos Mesh documentation contains full set of information about those experiments. Also they have several interactive tutorials. The experiments have been run against a three node nginx
cluster. Following is the nginx cluster deployment.
Pod Failure Experiment
PodChaos
experiment can simulate fault scenarios of the specified Pods or containers. Currently, PodChaos supports pod-failure, pod-kill, container-kill fault
types. In following experiment Chaos Mesh injects pod-failure
into the specified nginx Pod and makes the Pod unavailable for 30 seconds. Pod selection happens through the labelSelectors
flag. Following is the specification of Pod Failure experiment.
The experiment can be deploy via kubectl apply -f chaos-pod-failure.yaml
command. Once deployed the experiment, we can view the status of the experiment from Chaos Mesh web dashboard.
Pod Kill Experiment
In following experiment Chaos Mesh injects pod-kill
into the specified nginx Pod and kill the Pod. Following is the specification for Pod Kill experiment.
The experiment can be deploy via kubectl apply -f chaos-pod-kill.yaml
command. Once deployed the experiment we can view the status of the experiment from Chaos Mesh web dashboard.
Stress Experiment
Chaos Mesh provides StressChaos
experiments to simulate stress scenarios inside containers. The following experiment will create a process in the selected container(nginx
), continuously allocate and read and write in memory, occupying up to 256MB of memory.
The experiment can be deploy via kubectl apply -f chaos-stress.yaml
command. Following is the result of the experiment on Chaos Mesh web dashboard.
Http Abort Experiment
HTTPChaos
can simulate the fault scenarios of the HTTP server during the HTTP request and response processing. Currently, HTTPChaos supports simulating the four types 1) abort
, 2) delay
, 3)replace
and 4) patch
. In this experiment every 10 minutes, Chaos Mesh will inject the abort
fault into the specified pod for 5 minutes. During the fault injection, the GET requests sent through port 80 in the /api
path of the target Pod will be interrupted.
The experiment can be deploy via kubectl apply -f chaos-http-abort.yaml
command. Then Chaos Mesh web dashboard will list the status of the experiment.
Reference
- https://www.gremlin.com/community/tutorials/chaos-engineering-the-history-principles-and-practice/
- https://pingcap.com/blog/chaos-practice-in-tidb
- https://dzone.com/articles/run-your-first-chaos-experiment-in-10-minutes
- https://medium.com/nerd-for-tech/chaos-engineering-in-kubernetes-using-chaos-mesh-431c1587ef0a
- https://itnext.io/getting-started-with-chaos-mesh-and-kubernetes-bfd98d25d481
- https://medium.com/rahasak/replace-docker-desktop-with-minikube-and-hyperkit-on-macos-783ce4fb39e3