CIRCUIT BREAKER DESIGN PATTERN IN MICROSERVICES

Hansini Rupasinghe
Nerd For Tech
Published in
6 min readJun 10, 2021

✷ Circuit Breaker is more than just a design pattern. It is very close to a sustainable pattern.

✷Circuit Breaker pattern helps you to keep your service without dying or keeping good health of your service. (Fail Fast)

Why this is important ❗️❓

In a traditional system, we have no idea on how other services would break. So each service is responsible for staying alive. In other way, developers are responsible for keeping the services alive.

This is not just applicable to microservices, but this is very critical in microservices since we have multiple services that behave in different ways that are maintained by different teams.

An Example for Circuit Breaker in Real Life

If your house is powered by electricity, there is definitely a circuit breaker.

You get power from main grid, but it comes through a circuit breaker. If your main grid behaves abnormally, or if a lightening strike causes additional power on the power grid, it will break the circuit breaker and it will go off. In that way, the internal wiring of your home will be protected.

If we take our previous example,

● When you have multiple services, there is a high possibility that those services calling multiple backends. And you can create a pattern like aggregator to call those services.

● When considering availability, it is usually guaranteed that the services are 99.999% uptime.

If you do some simple math to understand this;

Therefore, one service can be down only for 5.256 mins per year.

This is fine in monolithic applications. But when it comes to microservice architecture, you have multiple services. Let us say you have 100 services.

That may not be acceptable and that is why we need to pay attention on protecting services.

States of the Circuit Breaker

Ref: https://images.app.goo.gl/RmCoFoHV5nauQgfh7

The circuit breaker has 3 states namely;

  1. Closed : The circuit breaker is in the closed state and pass all calls through to the remote service when everything is fine. When the number of failures surpasses the stated threshold value, the breaker trips while going into the Open state.
  2. Open : Without executing the function, it returns an error for calls.
  3. Half-Open : The circuit switches to a half-open state after a period of timeout, and it tests if the underlying problem is still there. The breaker is trips again if a single call fails in this half-open state. If it is successful, the circuit breaker resets back to the normal, that is the “closed” state.

Causes to break the service

Scenario 1

Let us assume you have 5 different services , a web server to call this and implemented using Aggregator pattern (Either chain pattern or parallel pattern). Now you will be getting the request. That means, server has allocated one thread to call that service. But now, service is a bit delayed and the thread waits or it times out. It is fine one thread to wait, but if it is a high demand service and if it keeps getting more and more requests, the threads in the pool have to wait one after the other.

Let us say you have 100 threads, may be 98 threads are now occupied. Other 2 threads may be consumed for other services. So all threads have occupied or blocked at the moment.

In such a case, remaining requests that come to your service will be blocked or queued. Let’s say there are 100 threads and 50 of them are in a queue. Somehow, this failed service was recovered back. But still, the web server is trying to process all those requests that are waiting in the queue. As a result, your web server or proxy will never recover because though it processes the queue, requests do not stop coming. This type of a scenario will kill your service.

Scenario 2

Let us say there is a scenario where;

Meantime, you have W, X, Y and Z other services. If service D fails to respond on time, service C will wait. Therefore, service B will wait too. That means, service A will also be waiting. This can cause cascade failures.

✹ No matter how, the service will go offline if you fail and that is unacceptable.

Now, we need to keep a way to keep these up and running.

Let us take the same scenario: 4 backend services & proxy or pattern that calls these services

Let us define thresholds for this.

Circuit Breaker Pattern works in a way;

🔸 If 75% of the requests are reaching the upper threshold, service see that this is failing slowly. If number of occurrences that exceed 200 ms, which means the maximum threshold you gave for the service exceeds the number of times, it will understand that the service is not responding anymore. Next request that comes to access service A will fail back. It breaks the connection between your proxy and service A. Then, the proxy will not go to service A which means it will not wait.

When Service is in good response time

Why do you need to implement something in between instead of going to the service directly and see if it is failing?

✹ Let’s say you have a timeout of 30s. If each request is trying to hit service A without considering it is failing, all the requests that come from consumer will wait 30s. End of this timeout of 30s, those will fail.

But during that 30s time, the remaining requests that come to consume service A will be trying to reach service A and those will also wait in the queue.

What Circuit Breaker Pattern Does?

✹ If a service is failing more than the given threshold, it will not try to hit that particular service at all. It will fail back and inform the consumer saying that the service is not available.

How will it connect back?

✹ At the same time, it sends a ping request/default request to that particular service. (from time to time) When this response time comes back to the normal threshold, it will turn the circuit on again. The next request which comes to consumer the service will go directly and hit the service.

✹ There will not be a queue anymore because all the requests that came to consume the particular service was sent back with an error message during the failure time. Therefore, there will be no queue. When the service is up back, it will go open for the new traffic.

✹ Although certain consumer requests fail, the whole system will not fail if we use this method. In case if we let all the consumer requests without failing any single request, the whole system will fail and there will be a huge queue behind the service. Even though the service comes up, those queues will consume the particular service and eventually fail trying to process them. Therefore, certain requests will fail for some time when using this method. The remaining services will be served as soon as the failed service comes back. That is the principle behind this design pattern.

References

--

--