The “Domino Effect” in a micro-service architecture

Chaitany Bhardwaj
4 min readDec 24, 2021

--

Abstract:

Unlike the “Dominos” effect, which we guys have already fallen prey to, this, “The Domino Effect” in a microservices architecture is not pleasing at all, also not edible.

Basically, the effect states that when you make a change to one behaviour it will activate a chain reaction and cause a shift in related behaviours as well.

As the definition suggests, this is a microservices anti-pattern in which services are not adequately decoupled to prevent outages in one service from affecting others. When services are tightly coupled to one another, failures in upstream services can cause downstream services to fail in turn. This phenomenon can bring down the whole architecture in just a span of a few hours or even minutes, depending on the scale of the operation and how tightly the services are coupled.

Here, we will discuss how such a situation can occur and what measures we can take in order to avoid this harming our system so that we can enjoy our “dominos” peacefully!

Replicating the scenario:

Let’s take a scenario in which there are 3 services involved in a microservice architecture A, B and C synchronously interacting with each other. If Service C fails or is unavailable for any reason, it will create a domino effect which will cause services B and A to fail in turn. In this way, Services A and B can never be more resilient or reliable than Service C. Such design is a matter of huge concern as there is no resiliency in our system and also service C is kind of a single point of failure

Before failure is introduced
The “Domino” Effect

If Service C fails or is unavailable for any reason, it will create a domino effect which will cause services B and A to fail in turn. In this way, Services A and B can never be more resilient or reliable than Service C. Such a design is a matter of huge concern as there is no resiliency or availability in our system as, if talking about the scenario above, service C is kind of a SPOF(single point of failure) which will stop the entire system from working. SPOFs are disfavored in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial systems.

Fending off techniques:

Introducing a circuit-breaker

This circuit breaker acts just like the MCB’s installed in our home which is nothing but devices that switches off the circuit automatically if an abnormality is detected in the electricity load. We all have heard this line at some point in time, “go check if the MCB is down”, and most of the time it was a genuine issue :P. From now onwards, let's call circuit breaker, “CB”, throughout this article, matches with my initials too.

The idea behind a CB is pretty straightforward. Our CB object checks for a threshold number of failures and trips if the limit exceeds that certain threshold. This way you can prevent the protected calls to be made from the culprit system to the unaffected ones. Usually, you’d require some kind of a monitoring alert if the circuit breaks.

Obviously, this alone would not do the job for us as you would want to make the calls again once the affected service or system is up and running again, kind of a SELF-RESETTING mechanism. For this, you could set up a polling mechanism that would try to make the underlying call in regular intervals and connect again once the failure is gone followed by resetting our CB.

Asynchronous Calls

Services in a microservice architecture should be minimilistically coupled with each other. You can make this possible by making your services communicate through asynchronous calls. This way we can reduce one service’s dependency on other services involved as we are not waiting for the response and moving ahead without even caring. Consider introducing a message queue to buffer data transfer between services.

--

--

Chaitany Bhardwaj

Software Development engineer with proven development experience on mature product teams, and a knack for continually improving how the team operates.