Resilience Patterns in Spring Boot with Resilience4j: Part 1 — Circuit Breaker

Uğur Dirim
cloudnesil
Published in
6 min readJul 2, 2021

A system is resilient if it continues to carry out its mission in the face of excessive stresses that can cause disruptions¹. Lately, the concept of resilience has become more important due to microservice architecture becoming popular, since distributed and complicated nature of microservices makes them prone to cascading failures. Several resilience patterns have been developed to prevent cascading failures. Common patterns include circuit breaker, bulkhead, rate limiter, retry, time limiter and cache.

Resilience4j is one of the libraries which implemented the common resilience patterns. It’s a lightweight, easy-to-use library inspired by Netflix Hystrix, but designed for Java 8 and functional programming². Resilience4j has become more popular after Netflix announced that Hystrix will no longer be in active development and recommended Resilience4j as a replacement³. Resilience4j is also included in Spring Cloud.

In this article, We’ll first explain the circuit breaker pattern implementation in Resilience4j library and then demonstrate it using a sample Java Spring Boot application. This article will be the first one in a series of articles about resilience patterns in Resilience4j.

What is a circuit breaker?

It is important to understand how a circuit breaker works, before we delve into code examples. Whether we are dealing with a microservice or monolithic architecture, it is pretty common for a software service to call a remote software service. However, when the remote service is down and thousands of clients are simultaneously trying to use the failing service, eventually all the resources get consumed. Circuit breaker pattern helps us in these kinds of scenarios by preventing us from repeatedly trying to call a service or a function that will likely fail and prevent waste of CPU cycles.

Circuit Breaker State Machine (https://files.readme.io/39cdd54-state_machine.jpg)

Circuit breaker works as a state machine which has three states (CLOSED, HALF_OPEN, OPEN) as you see on the above diagram. Normally the circuit breaker operates in the CLOSED state. When a service tries to communicate with another service and the other system fails to respond, the circuit breaker keeps track of these failures. When the number of failures is above a certain threshold it goes into OPEN state. No further requests are sent to the service downstream. Meanwhile the circuit breaker may also help the system degrade gracefully by using some fallback mechanism, by returning some cached or default response to the caller. After a specified wait duration the circuit breaker goes into HALF_OPEN state and tries to reconnect to the service downstream. Again if the failure rate is above a certain threshold it goes back into OPEN state, else it goes into CLOSED state and returns to normal mode of operation.

Example Spring Boot Implementation

Let’s see what we can achieve with a Resilience4j Circuit Breaker in a sample Spring Boot application. For this purpose, we’ve created a demo project, which is available in github (https://github.com/nixarbug/gulliver-resilience). This project demonstrates a hypothetical food delivery system called Gulliver and consists of a gateway to control the flow of information, a restaurant microservice to hold and serve restaurant information, and other microservices to get orders, assign couriers, and notify customers, etc. For the sake of simplicity, we’ll use only the Gateway and the Restaurant Microservice, to demonstrate Resilience4j Circuit Breaker functionality.

Gulliver Food Delivery System

In our scenario, an Actor sends a request to get a list of restaurants to the Gateway. Then, Gateway forwards those requests to the Restaurant Microservice using route definitions. The Restaurant Microservice has a controller with only one method to return a static list of restaurants as can be seen below.

@RestController
@RequestMapping("restaurants")
public class RestaurantController {
@GetMapping
public Flux<Restaurant> getRestaurants(){
return Flux.just(
new Restaurant(1l, "Fiske"),
new Restaurant(2l, "Alibaba Kebab"),
new Restaurant(3l, "Apple Pizzeria"),
new Restaurant(4l, "Simit Palace"),
new Restaurant(5l,"Niagara Soup Place")
);
}
}

The circuit breaker functionality resides in the Gateway. To enable the circuit breaker and to monitor it using Prometheus and Grafana we’ve added the following two dependencies to the build.gradle file.

implementation 'org.springframework.cloud:spring-cloud-starter-circuitbreaker-reactor-resilience4j'
implementation 'io.github.resilience4j:resilience4j-micrometer'

Our circuit breaker is configured using a configuration class, which specifies sliding window size as 20 and failure rate threshold as 50. This means that out of the last 20 requests, if 50% percent of them fail the circuit breaker goes to OPEN state. Also, wait duration, which controls how long the CircuitBreaker should stay OPEN before it switches to HALF_OPEN is set to 60 seconds. A permitted number of calls in HALF_OPEN state is set to 5, which means after 5 requests the circuit breaker goes into OPEN or CLOSED state depending on the failure rate.

@Configuration
public class CircuitBreakerConfiguration {

@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
.circuitBreakerConfig(CircuitBreakerConfig.custom()
.slidingWindowSize(20)
.permittedNumberOfCallsInHalfOpenState(5)
.failureRateThreshold(50)
.waitDurationInOpenState(Duration.ofSeconds(60)
)
.build())
.build()
);
}
}

This configuration applies to the following restaurant-route definition which contains path to the Restaurant Microservice and a fallback uri in the application.yaml file.

spring:
cloud:
gateway:
routes:
- id: restaurant-route
uri: http://localhost:9001/restaurants
predicates:
- Path=/restaurants/**
filters:
- name: CircuitBreaker
args:
name: restaurantCircuitBreaker
fallbackUri: forward:/restaurant-fallback

This fallback uri may actually be a uri in a controller class in the gateway, but for simplicity, we’ve added another configuration class to return an empty Mono as a response when the Restaurant microservice is down. We could also implement a cache mechanism using Redis to return the latest cached restaurant list as a fallback.

@Configuration
public class FallbackConfiguration {
@Bean
public RouterFunction<ServerResponse> routerFunction(){
return RouterFunctions.route(RequestPredicates.GET("restaurant-fallback"), this::handleGetFallback);
}
public Mono<ServerResponse> handleGetFallback(ServerRequest serverRequest){
return ServerResponse.ok().body(Mono.empty(), String.class);
}
}

To demonstrate Circuit Breaker, we’ll use the Grafana Resilience4j dashboard. This dashboard shows us the number of circuit breakers in each state and current failure rates etc.:

  • First, we run both microservices, make some requests to Gateway path http://localhost:9000/restaurants and observe the system in CLOSED state. In the CLOSED state we get the actual list of restaurants from the Restaurant Microservice.
  • Then, we’ll shut down the Restaurant Microservice and generate lots of requests using Apache Benchmark tool (make 100 requests in batches of 5 concurrent ones using the command “ab -n 100 -c 5 http://localhost:9000/restaurants) to make the system go into OPEN state. In this state, we get a default empty fallback result since Restaurant Microservice is down. This makes our system more resilient and saves valuable system resources.
  • Then, we transition into HALF_OPEN state after 60 seconds.

In this demo project, we’ve demonstrated how we can achieve a resilient system using Resilience4j Circuit Breaker pattern. In the following, articles we’ll delve into bulkhead, rate limiter, retry, time limiter and cache patterns. Stay tuned.

[1]: System Resilience: What Exactly is it?https://insights.sei.cmu.edu/blog/system-resilience-what-exactly-is-it/

[2]: Resilience4j Getting Started Guide https://resilience4j.readme.io/docs/getting-started

[3]: Hystrix: Latency and Fault Tolerance for Distributed Systems https://github.com/Netflix/Hystrix

--

--