Monkey Business: When Chaos Engineering Meets Spring Framework for Resilience!

5 min readMay 23, 2023

Welcome, fellow adventurers of Chaos and resilience! Today, we dive into a thrilling tale of unexpected encounters between two powerful frameworks: Chaos Engineering’s mischievous Chaos Monkey and the robust Spring Framework. Brace yourselves for a wild ride filled with valuable insights into enhancing system resiliency. Lets embark on this captivating journey as we explore the curious fusion of chaos and structure, where unpredictability meets reliability in the name of rock-solid intricacies that lie ahead.

“Unleashing Mayhem with a Purpose: An Introduction to Chaos Engineering”

Chaos Engineering is a discipline that aims to uncover vulnerabilities and enhance the resilience of complex systems through controlled experimentation. By deliberately introducing failures and Chaos into a system. Chaos Engineering helps organizations identify weaknesses and strengthen their infrastructure, applications, and process.

In the realm of Chaos Engineering, understanding the concept of blast radius is crucial. The blast radius refers to the extent and impact of disruption caused by chaos experiments on a system. It encompasses the scope of potential failures and the resulting consequences throughout the system’s components.

Determining the blast radius involves analyzing dependencies, architectural considerations, and the potential for collateral damage. Understanding and managing the blast radius is delicate balancing act. By carefully navigating the blast radius, Chaos Engineering becomes a valuable tool for resilience testing, enabling organizations to build more robust systems without compromising stability.

Chaos Engineering follows a set of main principles

Define Steady State

Chaos Engineers establish a baseline of normal functioning, known as the steady state, for the system under observation. This involves understanding how the system behaves when running smoothly.

Hypothesize and Inject Failure

Engineers then formulate hypotheses about potential weakness or failure scenarios. They methodically inject controlled disruption, such as network failures, server crashes, or database outages, into the system to test its resiliency.

Monitor and Learn

Throughout the chaos experiments, various monitors and observability tools are utilized to capture and analyze system behavior. Engineers closely observe how the system reacts to failures, uncovering insights that can guide improvements.

Automate Experiments

To scale Chaos Engineering practices and ensure continuous resilience testing, automating chaos experiments become crucial. By creating reliable and repeatable chaos scenarios, engineers can perform regular tests and validate system responses.

Embrace “Failure as Learning” Mindset

Chaos Engineering embraces failure as opportunities for learning and growth. Instead of fearing failure, organizations adopt a mindset that views failures as valuable insights, allowing them to proactively address weakness and strengthen their systems.

Origins Explained!

The Chaos Monkey framework, originally developed by Netflix, emerged as a pioneering force in the realm of Chaos Engineering. Designed to intentionally inject failures into distributed systems, Chaos Monkey disrupted the conventional approach to software Resilience.

“Unleashing Chaos, Learning from Titans: Exploring the Chaotic Secrets of Tech Giants”

“AWS Gameday”

During Gameday, teams simulate real-world failure scenarios, orchestrating controlled chaos to validate system resiliency. This hands-on approach enables AWS to proactively identify weaknesses, optimize system behavior, and continuously improve their cloud services.

“Azure Chaos Studio”

A framework that enables controlled fault injection into distributed systems. This allows Azure engineers to assess the system’s response to failure and fine-tune their services accordingly, ultimately bolstering reliability and customer satisfaction.

Netflix, a pioneer in Chaos Engineering, leverages its homegrown tool, Chaos Monkey, to create controlled disruptions which its complex streaming platform. By randomly terminating production instances, Chaos Monkey verifies Netflix’s ability to gracefully handle failures and automatically restore service without significant user impact. This iterative process has greatly contributed to Netflix’s reputation for delivering uninterrupted streaming experiences.

Chaos Monkey

Experience the power of Chaos Engineering and system resilience through a proof-of-concept (POC) that combines Spring Boot and Chaos Monkey. By integrating Chaos Monkey into a Spring Boot application, controlled failures and disruptions are injected, allowing engineers to identify weaknesses, validate resilience, and optimize error handling.

Watcher and Assaults: Nuts & Bolts

Watcher monitors the health and behavior of the Spring Boot application, constantly observing its performance, availability and response times. It provides valuable insights into the system’s state, allowing engineers to track anomalies or deviations from the expected behavior.

Assaults [Latency Assault, Exception Assault, Kill App Assault] are commonly referred to as Chaos Monkey, are responsible for injecting chaos into the system. They introduce controlled failures, such as terminating instances, inducing latency, or simulating network issues. These deliberate disruptions test the application’s resilience, enabling engineers to identify weakness and validate its ability to recover gracefully.

Here are some key considerations to keep in mind when writing code.

Use profile chaos-monkey

spring:
  profiles:
    active: chaos-monkey

pom.xml


<dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>de.codecentric</groupId>
            <artifactId>chaos-monkey-spring-boot</artifactId>
            <version>3.0.0</version>
        </dependency>

application.yml

management:
  endpoint:
    chaosmonkey:
      enabled: true
    chaosmonkeyjmx:
      enabled: true

  endpoints:
    web:
      exposure:
        include: 
          - health
          - chaosmonkey

You will see the ‘ChaosMonkey” on start-up if the profile loaded successfully.

List of important end-points

http://localhost:8080/actuator/chaosmonkey
http://localhost:8080/actuator/chaosmonkey/status

http://localhost:8080/actuator/chaosmonkey/watchers #GET & POST
http://localhost:8080/actuator/chaosmonkey/assaults #GET & POST

http://localhost:8080/actuator/chaosmonkey/assaults/runtime/attack  #POST

That’s All!

You can get more info from here,

Summary

We have seen Chaos Engineering with Chaos Monkey in Spring Boot presents a paradigm shift in the pursuit of system resilience. By deliberately injecting chaos and failures, organizations can proactively uncover vulnerabilities, optimize error handling, and fortify their Spring Boot applications. Embracing Chaos becomes a catalyst for resilience, enabling organizations to stay one step ahead, deliver exceptional user experiences, and navigate the ever-changing landscape of modern applications with confidence. Harness the power of Chaos Engineering and Chaos Monkey in Spring Boot, and embark on a journey towards more resilient and reliable systems.

If you found my blog posts enjoyable and gained new insights, I kindly ask you to think about sharing them and joining me here for future updates. The source code can be found on my GitHub, also you can reach out to me on LinkedIn for any questions or suggestions.

That’s all for now, Happy Learning!