Improve resilience in the .NET applications

This article explains how to improve resilience for .NET applications using Polly library.

Emanuele Bucarelli
8 min readApr 23, 2020

Modern applications are increasingly oriented to microservice and distributed architectures that are focused on the cloud and many interactions take place through services.

Even if you are writing your services perfectly, flawless code and so on, the error will arrive. Why? What did I do wrong?

The response can be “Nothings”. Some errors are outside of our domain and can be present only for a limited time. In case you have a network issue or high resources’ usage, for instance, API would return 500 (or others errors).

To increase users experience, in the past, this kind of problem could be solved by implementing loops or using some mix of try-catch and if-else, nowadays you can write elegant and efficient methods with Polly.

Photo by Karim MANJRA on Unsplash

What is Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner.

Polly is available like Nuget package and the installation is very easy:

install-package Polly

The basic concept of Polly is defining your policy using the Policy interfaces, with that, you can handle the exception and implement the best strategy for your application.

Reactive Resilience

In case you have some errors you can plan afterward, the best strategy is to apply to your application.

RETRY

As its name suggests, the “Retry” capability can be applied when you get an error from your service and you want retry request until this has success or has achieved some boundaries. In these cases, the error is expected to be temporary and a repeated execution should succeed in the following attempt.

You can apply this strategy for:

  • Retry: if the fault is unexpected you can retry the request immediately after.
  • Retry after delay: the fault is caused by the busy failure then the retry request should be waiting for a suitable time before executing.

In the example above, we have defined policy for executing service, and in case of exception retry execution, three times with a delay of 500, 1500 and 2000 milliseconds.

This policy can be not useful when:

  • the issue that produces the error will stand for a long time.
  • your service fails frequently. This is a sign that the service or resource being accessed should be scaled up and the retry guises real problem.

Retry is not the solution to the problems. It is mitigation for sporadic or accidental issues, but if you have a systematic issue, like errors or inadequate computational power for usage peaks, you will need to fix it in the right place.

Circuit Breaker

There are some cases of faulting where continuous retry is unlikely to have success and the fault is supposed to be high. In this case, it is more suitable that the application handles the fault quickly and manages that.

Circuit Breaker work with this state:

  • Closed: the application works well and every request is sent to the service. The state does not change even if you have many faults that exceed the threshold boundary.
  • Open: the service is never invoked and the application fails immediately. This state has a definite period of duration before passing to the next state or directly to Close state.
  • Half-Open: a limited number of invocation of service is permitted. If any invocation has success the state pass to Close, on the other hand, if you get one error the state pass to Open.

Polly offers two implementations of the circuit breaker: the Basic Circuit Breaker, which breaks when a defined number of consecutive faults occur, and the Advanced Circuit Breaker, which breaks when a threshold of faults occur within a time, during which a high enough volume of requests as been made.

This policy can’t be useful when:

  • is used to manage business logic exceptions. For example, if you have an API that fails often for bad data consistency, the solution is to correct data and do not increase complexity on application.
  • is used to manage exceptions from local resources. For example, if you want to handle an exception raised from resources stored in memory, this represents an overhead for the application not necessary.

Circuit Breaker can be useful, but you need to manage all the stack to the UI to get a benefit. Avoid a call to a broken service is a benefit if you all application will continue to work (except the feature strictly related to the service).

Fallbacks

Sometimes a request is going to fail no matter how many times you will retry. The Fallback policy lets you return some default or perform an action like logging, send notification or restarting service.

Proactive Resilience

The methods that we have seen until now execute an action like a consequence of a result (success or exception). There are some cases where we want to define a strategy before calling action to preserve your resources. In these cases next strategies can help to prevent errors and reduce resource consumption.

Timeout

The Timeout policy lets you decide how long any request should take. If the request takes longer than specified, the policy will terminate the request and cleanup resources by the usage of a cancellation token.

Caching

Applications, to reduce networking overhead or repetitive access data that changes very slowly, can implement caching policy. Polly supports different cache policies using in-memory or third-party providers to manage that.

BulkHaead Isolation

A Bulkhead policy is a strategy where it is possible to limit the consumption of resources. A Bulkhead is a section of the application that can be sealed from others, and the failure of a single section does not comport of the whole application failure.

In our context, building a bulkhead policy means adding a parallelism limit around one stream of calls, limiting the resource consumption caused by the stream of calls and preventing the consumption of all the resources in the host.

You specify how many concurrent requests of a given type can execute and how many can be queued for execution. If the execution and queue slots are in use, no more requests of that type can be processed.

Policy Wraps

In the real case, it is often necessary to use more policy on the action context. Polly allows wrapping more policies to be chained together. In this example, we have defined a policy that wraps Retry, Circuit Breaker and Timeout.

Use Policy with DI

Previously we have seen how to define and register Policy on IServiceCollection. Suppose we have an interface register on DI to access the database, usually, our code is something like that.

Obviously, in this way, a single exception on the connection would cause an error and therefore a response of StatusCode 500 returned to the user.
In this case, if we want to use Polly to make the request resilient we can add Retry policy.

Polly over HTTP

In modern web applications, it is common to have an API layer to consume business logic and access data. Starting from ASP.NET Core 2.1 the registration of the HttpClient class is supported through IHttpClientFactory which allows to:

  • Centralize on Startup Class the configuration of Http client
  • Define multiple clients and use the right client based on the context
  • Register and manage the life cycle of the Handlers associated with the client

From the ConfigureServices method of the Startup class we can configure the default settings of each instance of HttpClient, such as the base URL, the request headers and assign a name for the instance, using AddHttpClient extension method. After the installation of the Nuget Package Microsoft.Extensions.Http.Polly it is possible to register with the client all resilient capabilities offered by Polly.

It is possible to register multiple policies and HttpHandlers, the flow of execution is likely described on Wraps policy.

Example: Make resilient your .NET Client built with NSwag

In this example, we configure an application console to call web API, integrating Polly on client generated using NSwag.

Build the Client

We assume that we have already implemented APIs and any detail about register swagger and client generation with NSwag is demanded to Microsoft documentation

Define Application Policy

First of all, we want to define the policy available on the application. For this demo, we have added an extension method of IHttpClientBuilder to register the policy with HttpClient. This, for example, is an extension of IServiceCollection to register Waiting and Retry Policy

For the demo we have implemented these policies:

  • Wait and Retry
  • Circuit Breaker
  • Retry Forever
  • Timeout

Any other policy can be easily added in this context.

Register HttpClient

On ConfigureService we can register one named client for each policy.

Call API with policy

The client class generated from NSwag needs on constructor in the instance of HttpClient, which can be generated using DI. Then using named HttpClient we can call all API applying different policies about the context.

Conclusions

We have seen how simple it is to integrate Polly on .NET applications and how useful it can be to make application resilient. The fluent configuration mixed with DI registration allows integrating resilient functionality in both new and existing applications.
This library in the next years will have more and more success becoming a standard library on all applications.
However, it is important when you integrate this functionality to have in mind the advantages and disadvantages of the pattern that you are implementing. You should never use this library to hide structural problems.

Finally, we have seen that these features are fully declarative and managed globally, so there is no addiction effort compared to the plain cases and ca be integrated into already made applications.

References

--

--