Resiliency patterns in microservice architecture

Published in

The Jabberjays

4 min readMar 2, 2021

As now we are shifting from monolithic applications to microservices architecture it is very important to handle the failure case scenarios which are common to all applications. Here we are going to some of the resiliency patterns which needs to be implemented in order to achieve a greater user experience and maximum success rates of an application. Following are the most common resiliency patterns which are used:

Bulkhead Pattern

Isolate the services so that can be used for high priorities. Creating the services based on load requirement and then isolating them based on their load so even if that service fail related to that client only that client may experience the load and won’t affect other consumers.

Example: Let’s consider we have 2 clients (A and B) which are reliable on booking service so to serve client A is being high priority, now let’s consider if the service is going down if client B is making some request in this case the client is also has to wait. So to achieve minimum wait time for client A as it is being high priority we can create a separate thread pool for it making a good flow for client A, so even if the booking service fails because of client B, client A will still receive the service.

Fallback pattern

When the service is not available or it is failing the service can use an alternative code path for approach and do the required work.

Example: If the server is not able to retrieve data from the database then it may use it’s caching layer to get the last cached data to increase the user experience.

Circuit breaker pattern

It is used to make sure that if a system fails repeatedly for some number of times then the failure should be reported. The connection to that service should be closed and the services which request that particular service should be reported immediately that system has been failed so to reduce the wait time of the service as the eventual outcome is failure.

Example: Now for reservation service requires payment service in order to successfully reserve a ticket. Now let’s say the payment service takes some 2 minute and fails so for every request it is being happening like this so each user needs to wait for 30 seconds and has to experience failure. So as the system know that the service will fail we can break the connection for all requests and allow some specific request to try. If it is still failing then we can return the result as failure instead of trying to save the user’s time and tell the user to retry after specific amount of time.

Compensating transaction pattern

In each transaction some of the steps are being performed now if the middle step is being failed then we need to reverse back the steps which were performed if that affects the system.

Example: Let’s consider we have 3 services named booking, reservation and payment. Now we have completed the payment and booking is done but now the user changed his/her mind and cancelled the booking. So in order to cancel the booking we need to do a series of step to undo the booking. So we need to make the repay the user the amount and we need to make the seat available again so that other users can book that ticket.

Retry Pattern

Retry patterns are used to retry the same function after a specific interval of time ensuring the service may work after some time. Interval can be of two types fixed time and exponential time interval. Even the number of retires need to be specified as we don’t want to try the service if that service is permanently shut down

Example: When we are trying to do the payment and that payment service provider is not available for some time because of some internal server error then the payment can be retried to check if that service is available at that instance of time.

Queue based levelling pattern

Sometimes the services are not available because of some internal server error or high load but the task is critical to perform and the task need not be repeated so we can use queue based systems to store the request and do the task once the service is ready to take new requests. This can be used in a combination with other patterns to maximize the availability.

Example: Let’s consider the payment service is not responding so we break the circuit now instead of making the requests failed we can store those requests in queue for later processing once the service is available and if even after a later point of time the service is not available then we can mark the requests in the queue as failed an return to the user.

So these are some of the resiliency patterns which can be used. Thanks for reading, if you liked please clap any suggestions are welcome in comment section.

References:

https://docs.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns