Getting Started with Litmus SDK
In this blog, I will talk about the generation of custom/new chaos experiment with the help of Litmus SDK. Before jumping in, let’s do a quick recap on Litmus. Litmus is a framework for practicing Chaos Engineering in cloud-native environments. Litmus provides a chaos-operator, a large set of chaos experiments in its hub, detailed documentation, quick Demo, and a friendly community.
What is Litmus SDK?
The Litmus SDK provides a simple way to bootstrap your experiment and helps create the aforementioned artifacts in the appropriate directory (i.e., as per the chaos-category) based on an attributes file provided as input by the chart-developer. The scaffolded files consist of placeholders which can then be filled as desired.
It generates the custom chaos experiments with some default Pre & Post Chaos Checks (AUT & Auxiliary Applications status checks). It can use the existing chaoslib (present inside /chaoslib
directory), if available else It will create a new chaoslib inside the same directory.
Life Cycle of a Chaos Experiment
Each Chaos Experiment is divided into six main sections:
- Prepare: Prepare section contains the initialization steps (get ENVs), Updation of chaos result to marked the beginning of the chaos experiment.
- PreChaosCheck: PreChaosCheck contains some default checks (AUT & Auxiliary Application status check) and we can add our custom check as well depends on the experiment i.e, liveness check, data persistence check, etc.
- ChaosInject: ChaosInject invoke the actual chaos, It contains the main business logic.
- CleanUp: CleanUp contains the steps to remove the helper/external pod if any
- PostChaosCheck: PostChaosCheck contains the same steps which present inside the PreCheckCheck section to ensure the resiliency after chaos injection.
- Summary: Summary updates the verdict(Pass/Fail) inside the chaos result and FailStep as well if the experiment fails.
Pre-requisites
Go Experiments
- go should be is available & the GOPATH env is configured appropriately
Ansible Experiments
- python3 is available (sudo apt-get install python3.6)
- jinja2 & pyYaml python packages are available (sudo apt-get install python3-pip, pip install jinja2, pip install pyYaml)
Steps to Generate Experiment via Litmus SDK
- Clone the litmus-go repository for go experiments and litmus-ansible for the ansible experiments and navigate to the contribute/developer-guide folder.
## for litmus-go$ git clone https://github.com/litmuschaos/litmus-go.git
$ cd litmus-go/contribute/developer-guide## for litmus-ansible$ git clone https://github.com/litmuschaos/litmus-ansible.git
$ cd litmus-ansible/contribute/developer_guide
- Populate the attributes.yaml with details of the chaos experiment (or chart). Use the attributes.yaml.sample as reference.
As an example, let us consider an experiment to kill one of the replicas of an Nginx deployment. The attributes.yaml can be constructed like this:
- Run the following command to generate the necessary artifacts for submitting the sample-category chaos chart with pod-delete experiment.
## litmus-go$ go run generate_experiment.go -attributes=attributes.yaml -generateType=experiment## litmus-ansible$ python3 generate_chart.py --attributes_file=attributes.yaml --generate_type=experiment
- Note: In the
--generate_type
(litmus-ansible) or-generateType
(litmus-go) attribute, select the appropriate type of manifests to be generated, where,
* chart: Just the chaos-chart metadata, i.e., chartserviceversion.yaml
* experiment: Chaos experiment artifacts belonging to an existing or new chart.
Verify the Generated Files
- Proceed with construction of business logic inside the pod-delete.go file in litmus-go or pod-delete-ansible-logic.yml in litmus-ansible, by making the appropriate modifications listed below to achieve the desired effect:
* variables
*entry & exit criteria checks for the experiment
* helper utils in either pkg or new base chaos libraries - Update the experiment.yaml with the right chaos params in the spec.definition.env with their default values
- Update the chaoslib/litmus/pod-delete/pod-delete.go chaoslib to achieve the desired effect or reuse the existing chaoslib.
- Create an experiment README explaining, briefly, the what, why & how of the experiment to aid users of this experiment.
Steps to Test Experiment
- Run the pod-delete-k8s-job.yml with the desired values in the ENV and appropriate chaosServiceAccount using a custom dev image instead of litmuschaos/litmus-go (say, ksatchit/litmus-go) that packages the business logic.
- (Optional) Once the experiment has been validated using the above step, it can also be tested against the standard chaos workflow using the experiment.yaml. This involves:
- Launching the Chaos-Operator
- Creating the ChaosExperiment CR on the cluster (use the same custom dev image used in the above step)
- Creating the ChaosEngine to execute the above ChaosExperiment
- Verifying the experiment status via ChaosResult Refer Litmus Docs for more details on this procedure.
Steps to Include the Chaos Charts/Experiments into the ChartHub
- Send a PR to the litmus-go/litmus-ansible repo with the modified experiment files i.e, pod-memory-hog
- Send a PR to the chaos-charts repo with the modified experiment CR, experiment chartserviceversion, chaos chart (category-level) chartserviceversion & package (if applicable) YAMLs i.e, kubelet service kill
Conclusion
Litmus SDK will help developers & SREs to create their custom chaos experiment on demand. It will generate the templates, pre/post chaos checks, chaoslib (which can be modified according to use-case).
Are you an SRE or a Kubernetes enthusiast? Does Chaos Engineering excite you? Join Our Community #litmus channel in Kubernetes Slack
Contribute to LitmusChaos and share your feedback on Github
If you like LitmusChaos, become one of the many stargazers here