I’ll Do It Again: Using Retries to Ensure System Resilience
Implementing the optimal retry strategy to ensure the resilience of microservices-based systems.
System resilience
Modern web apps run in complex environments. They rely on many external dependencies, like databases and APIs. They rely solely on network communication. The Fallacies of Distributed Programming note that we can’t assume a perfect, secure network. In web environments, intermittent network failures and potential malicious actions are inherent challenges.
If an external dependency is unavailable, the client might retry the request, assuming the issue is temporary. This pattern is known as the retry pattern. The retry pattern allows automatic retries of failed requests. It has two parameters: the max retries and the delay between them. We will delve deeper into the details of this pattern in the next section.
Retry pattern
The retry pattern involves repeating an operation if a previous attempt fails. This can be implemented in a straightforward manner as follows: