How to Start with Chaos Engineering Experiments?
The acceptance criteria you should be using for chaos engineering experiments
Published in
3 min readApr 18, 2022
Chaos engineering experiments can be highly exploratory at the beginning.
You often have low to no clue of what to look for other than the possible unavailability of the system.
Let me give you a hand, and provide you with the acceptance criteria to get you started:
- Downtime — Are you permitting downtime during chaos engineering experiments? Perhaps you should. But not for all of the services. Thus, identify the critical and non-critical services and establish their allowed downtime. As a hint, you may want to consider vital services the ones to be directly related to revenue generation.
- Service degradation — How much time of a degraded service is acceptable? ie. Mean time to repair (MTTR) to default performance. Separate service degradation into two categories. The latency and the error rate. Be accountable for spikes when initiating the experiment, but ensure that you stipulate a maximum allowed time for the MTTR of your services.
- Data loss — What is your recovery point objective (RPO)? Are you considering data loss acceptable during chaos engineering experiments? Pay particular attention to requests that…