Providing chaos hooks to applications through Litmus Operator
Litmus operator allows developers and DevOps architects introduce chaos into the applications and Kubernetes infrastructure in a declarative intent format, in other words — the Kubernetes way
Litmus is gaining traction in the community as a preferred means of injecting chaos in Kubernetes-based CI/CD pipelines (see reference use-cases for NuoDB, Prometheus & Cassandra), and as one of the contributors to this project, that is heartening! One of the key benefits that Litmus brings to the table, in simple terms, is the fact that a chaos test/experiment can be run as a Kubernetes job with a custom resource as a test result. As you can discern, this is a model that promises easy integration with CI systems to implement chaos-themed e2e pipelines.
Why do we need a chaos operator and a workflow?
While what we have is Kubernetes native, the community felt that the toolset should be improved further to encourage its use in the actual places where chaos is thriving today: deployment environments (may they be Dev/Staging/Pre-Prod/Production). Though this doesn’t mean Litmus in its current form cannot be used against such environments (visit the openebs workload dashboards to run some live chaos on active prod-grade apps!), there are some compelling differences, or rather, needs that have to be met by the chaos frameworks to operate efficiently here. Some of the core requirements identified were:
- Ability to schedule a chaos experiment (or a batch run of several experiments).
- Ability to monitor & visualize chaos results mapped to an application over a period of time, thereby ascertaining its resiliency.
- Ability to run continuous-chaos as a background service based on filters such as annotations. This also implies the need for a resilient chaos execution engine that can tolerate failures & guarantee test-run resiliency.
- Standardized specs for chaos experiments with an option to download categorized experiment bundles.
In short, chaos needs to be orchestrated !!
The lifecycle of a chaos experiment
We define three steps in the workflow …