Resiliency and Fault Tolerance with Polly
This article is part of my Microservices and Cloud-Native Applications series. You can find the other parts of the series below.
- Saga Orchestration using MassTransit in .NET
- API Gateway with Ocelot
- Authorization and Authentications with IdentityServer
- Eventual Consistency with Integration Events using RabbitMq
- Distributed Logging with ElasticSearch, Kibana, and SeriLog
- Resiliency and Fault Tolerance with Polly
- Health Check with WatchDogs in a Microservices Architecture
- Distributed Tracing with Jaeger and OpenTelemetry in a Microservices Architecture
- Metrics to Monitor Microservices with OpenTelemetry and Prometheus
If you want to take a look at the GitHub code, you can access it here: https://github.com/ebubekirdinc/SuuCat
Resiliency and Fault Tolerance are important concepts in microservices architecture. Services are distributed among several nodes and interact with one another through a network in a microservices architecture.
This implies that failures could happen anywhere in the system, which could have an effect on the reliability and availability of the entire system.
A system’s resilience is its capacity to tolerate failures and recover from failures. On the other side, the ability of a system to function even in the face of errors is referred to as fault tolerance. Circuit breakers, retries, and timeouts are a few of the strategies that can be used to create fault tolerance.
In SuuCat resiliency and fault tolerance are implemented using Polly. Polly is perfect for this kind of work. Polly is a .NET library that provides several policies that can be used to implement resiliency and fault tolerance in a microservices architecture. Polly is generally known for being used in HTTP requests to repeat the request when the desired response is not received, e.g.: TimeOut. We know that it is a bad practice in a microservice architecture for services to be tightly coupled to each other via HTTP requests (except in extreme cases. For example, a final price check of the items in the cart during checkout). Therefore, here we will consider an error scenario that may occur during database creation while the application is starting up.
Now let’s see how we use it in our project. The following code calls the MigrateDatabaseAndSeed() method to create the database and seed it with data.
The MigrateDatabaseAndSeed() method uses the Polly library to implement a retry policy for a database seeding operation. The Handle() method of the Policy class, which describes the kind of exception to be handled, is used to first build a retry policy. In this instance, the policy is configured to handle any exceptions thrown while seeding the database.
The policy is then set up to use the WaitAndRetry() method to retry the action up to five times. The amount of time to wait between retries is specified using the sleepDurationProvider option. The Math.Pow() is used to calculate 2 raised to the power of the retry attempt number in order to get the duration in this situation. As a result, the first retry will wait for 2 seconds, the second for 4 seconds, the third for 8 seconds, and so on.
As you can see in the above Docker log, the database seeding operation is retried 3 times before it succeeds. You can see in the image above that it gives an error like this:
Retrying MigrateDatabaseAndSeed 00:00:02 of RetryPolicy -4face1c8 at null, due to: Npgsql.NpgsqlException(0x80004005): Failed to connect….
The first retry is done after 2 seconds, the second retry is done after 4 seconds, and the third retry is done after 8 seconds. The database seeding operation will be retried 3 times before it succeeds. This can be tested by stopping the PostgreSQL AssessmentDB container on Docker and then starting it again while starting the Assessment API.
Here you can see that the database tables are successfully created after 3 tries. With this, we have seen how Polly was used when there was an error to ensure resiliency. However, Polly is also used in areas such as Circuit Breaker, Fallback, Hedging, Timeout, and Rate Limiter.
More info can be found in the Polly docs, and SuuCat GitHub.
References: