Circuit Breaker Pattern

Govinda Raj
Zero Equals False
Published in
5 min readSep 9, 2020

It is a microservice architectural pattern. Remember, every pattern is discovered to solve a particular type of problem. So what type of problem it is going to solve?

Well, if you are familiar with microservice architecture, you definitely know there are many intercommunications between microservices. Let’s take an example: —

There are three microservices: —

  1. Contact Management Service(CMS)
  2. Survey Management Service(SMS)
  3. Sender Service(SS)

There is one aggregator service who is responsible to create contact using CMS and fetch survey information from SMS and send surveys to created users using SS. There are many requests to perform the above-mentioned actions. Don’t worry, you will understand the problem, bear with me.

Now, let's understand the problem here when the aggregator service is trying to call CMS, because, for some reason, CMS could not respond so it will result in a cascaded API call failure. So the problem statement is: —

How to prevent a network or service failure from cascading to other services?

Before going to HOW, let’s understand WHY we need to solve this issue???

When one service synchronously invokes another there is always the possibility that the other service is unavailable or is exhibiting such high latency it is essentially unusable. Precious resources such as threads might be consumed in the caller while waiting for the other service to respond. This might lead to resource exhaustion, which would make the calling service unable to handle other requests. The failure of one service can potentially cascade to other services throughout the application. So in the given example, there is high chance that all threads of aggregator service is waiting for the response from the CMS, now there are no threads to call SMS or SS, which will result aggregator service into blocker for other services. Now think there are thousands of microservices and millions of interactions between microservices. One service such as CMS can exhaust all other services who has cascaded calls.

Well, that was a good explanation. Now I know that I don’t want any resource to exhaust. So let’s focus on how we can solve it ??? BTW, It’s a little confusing that we say it’s a pattern but actually it looks likecircuit breaker problem . Anyway, how do we solve this problem?

There are many ways to solve this problem, but in all solutions, we have to stop this cascading service failure. One way I can think of is: — if somehow I can get to know about the previous call’s state whether it failed or passed then we can decide what to do with the current call.

Okay, but how to do that???

One way I can suggest is: — Let’s check the response state of the last call and store the response of the call in the cache every time. If the last call failed then let’s not call this time and will return a predefined message. Ohhh… sounds good and it seems, it solves my problem too.

Well, can anybody think of any issue in the above solution???

I can think of many problems in the above solution?

  1. What will be the expiry time of the cached response?
  2. What if, the failure were genuine but we still are not calling the service for a new call!
  3. It will be a little unfair if we decide on a single failed call because failure can be intermittent.
  4. Do I really need to store/update the response of each call every time, what if I only store for failure cases?

So, how do we solve all the above issues? Let’s do some customization: —

We will create on interceptor, which will store success and failure percentage for each inbound API calls and will keep one threshold let’s say 30%, if the threshold reaches then stop calling the service(Let’s say CMS) and wait for some time so that CMS recovers after time is over then again start to call the service.

This will solve our many problems but, we will not be 100% sure that the CMS is now ready to serve again because we don’t know for how much time the interceptor should wait. So, instead of waiting for the entire time, the interceptor can call the CMS to check if it ready to serve or not. Let’s understand the different states of interceptor using a diagram: —

If the interceptor is in the OPEN state, means it will not call the CMS(a problem creator) service and will wait for a predefined timeout period. After timeout is expired it will go to HALF-OPEN state where it will send one request to CMS to check if it is serviceable or not. If it’s not serviceable then it will go into the OPEN state again and will wait for the timeout period again. If CMS is serviceable then it will go to CLOSE state, where it will send all requests to CMS and will start tracking each call for success/failure percentage. If the failure percentage reaches to threshold then it will go to OPEN state again and the cycle will continue like that.

The above diagram is a circuit, and remember if the circuit is open then it will not allow any calls to defaulter service and if the circuit is closed then it will allow all calls to defaulter service.

Netflix’s hystrix project provides a library to do it in Java, but it’s in maintenance mode currently but you can use https://github.com/resilience4j/resilience4j which is a fault tolerance library designed for Java8 and functional programming licensed under apache.

That’s all my friend! Let me know if you found the above knowledge interesting then I will make another blog about how to use resilience4j.

Thanks.

--

--

Govinda Raj
Zero Equals False

Senior Software Developer. Tech Enthusiast and love coding. My portfolio: https://govinda-raj.github.io/