Chaos Engineering-Part I

Riddhi Pandya
4 min readJun 25, 2024

--

Build the system by breaking the system

Have you ever logged onto a website to shop for a Christmas sale? What if the dress you were so eager to buy is right in your cart and as you begin to checkout, the site goes down? Damn my luck! What if multiple users using the application start facing downtime exactly at the same time

A big loss for the company!

And that’s where Chaos Engineering helps us

Chaos Engineering?
Introducing disaster-like conditions that could happen in reality and checking the system performance

Possible scenarios that could happen in real time:-

Before we go further, let’s understand the term ‘Resilience’

Resilience is “The system’s ability to keep afloat when a fault happens”.
Another definition is “The ability of a system to recover from infrastructure or service disruptions.”

This is one of the most important factors to take care of while building infrastructure. Low resiliency can lead to increased vulnerability, inadequate recovery, limited scalability, dependency risks, longer restoration time and loss of reputation.

That’s exactly where Chaos Engineering will help us

Chaos Engineering is deliberately inducing fault into a system to identify what can happen when it happens in reality

The goal is also to discover weaknesses in a system through controlled experiments that introduce random and unpredictable behavior in the system

This happens in 4 simple steps

1)Steady state
2)Hypothesis
3)Experiment
4)Adapt

What is Steady state?
The way your system behaves in a normal condition is the steady state

Hypothesis?
You create/build a hypothesis around this steady state. You note down how your system behaves in a steady state. You hypothesize how your system might behave in case of an outage

Experiments?
Based on your hypothesis, you build your experiments. For eg your experiment could be related to network outage, network slowness

Adapt?
Based on the results of the experiment, you further decide what changes you need to make in the system to make it further resilient

There are many tools available in the market that help us do chaos testing.
In this blog, we will see an example of a Chaos Experiment with the help of the Litmus Chaos tool

Experiment: NETWORK LOSS

How and where do you configure your experiment
All the experiments that could be performed are available within the Litmus Chaos tool

Choose a scenario experiment

Chaos Center provides various options to create experiments.
Option 3 ChaosHubs has many pre-defined experiments. As a new user, preferably go with Option 3 to understand and view the different experiments

Workflow settings

Provide a name for your experiment workflow

In this section, you provide name for your workflow. This can be any name that helps you identify the scenario. Additionally, you can provide a description for the workflow

Tune workflow

Add your experiments

Now comes the main part of the workflow. Here, you add the experiments you want to perform. For this blog, let us go with the network loss experiment

Search with network-loss and you will be able to find an experiment called generic/pod-network loss

List of experiments-Litmus Chaos Center

Reliability score

Reliability scores give you the option of assigning scores to the experiment you have selected. In the above example, we have selected one experiment-network loss. But for eg. If we have 2 experiments node-drain and node-cpu-hog, then this is how it would look.
Here, if the experiment node-drain is more critical, you may assign a higher resiliency score eg:-10 and 8 for node-cpu-hog

Assigning a score

Schedule

You can select the schedule here depending on how frequently you would like to run your tests

Scheduling a chaos run

Verify and Commit

Once you click on Verify and Commit, you are all set to put your chaos experiment to run

Scheduling a run

Awesome! This is how you can create and schedule a run in the LitmusChaos tool

I hope this blog helps you in having a basic understanding of Chaos Engineering and also presents a quick view of how to create an experiment.

For more information on Chaos engineering, kindly refer to my next blog:-Chaos Engineering Part 2

Thank you!!

--

--