Exponential Back-Off With Scala Futures

A common sight in Scala code is to see Future[T] used as the result of network-based operations (such as a web-request). However, when dealing with a network, failures can be common. It’s likely you’ll want to retry this operation, but should only be done so if coupled with exponential back-offs. This post will show you how we can easily compose futures to achieve this.

Mock API

For our examples to actually work (in a REPL anyways), let’s make up a mock API to use.

Note: The code-blocks are structured to be pasted into a REPL. Each example depends on functions and imports from a previous example, so keep the REPL running for the entire article. You will need the Akka library, but can run sbt console using this project file.

Basic Retries (no back-off)

Before we can get into exponential back-off, we first need a basic retry in place. Let’s wrap our networkRequest with a function that retries infinitely.

By using recoverWith we can define a “fall-back” future that will be used when the original future fails. However if we make this fall-back future just a call to the same function, we generate an infinite loop on the failure case. If you run this in your REPL, you may see something like:

scala> networkRequestWithRetries()
retrying
retrying
retrying
retrying
res24: scala.concurrent.Future[String] = Future(<not completed>)

This tells us that the future eventually succeeded after 4 retries, so success!

Adding Back-Off

Retries are awesome but, with no wait between tries this can waste local resources (think busy wait) and potentially compound issues for an upstream service (for example, a service is unable to respond because of too many open connections).

But how do you make a Future wait?

Good question! Since Future is composable, we can simply combine any Future value with another Future that completes after some specified period of time. Easy peasy.

However, if you immediately thought of:

Stop it! This is bad, never do this.

Okay, fine… But how do I make a Future wait to complete?

The proper way to “sleep” a Future is with a scheduler that holds the Promise and only completes it when the specified time-period has passed. This only requires one thread to managing sleeping Futures instead of blocking multiple threads with Thread.sleep(). But that’s a lot of work, and Akka already has this feature built-in, so I’m going to take a shortcut.

In this example x is a Future that does not complete for at least two seconds. But more importantly, the Future on line 8 does not start executing until the first “sleeping” Future completes. With this, we can modify our networkRequestWithRetries function to use a constant back-off.

We now have a function that retries (infinitely I might add) and includes a 2-second wait between retries. Using composition makes this rather easy, but can be hard to wrap your head around. I find it easier to visualize as a state machine.

Since a Future can only result in success or failure, each state has only two possible transitions. A0 through An are attempts to call our networking API which will either succeed (res) or fail (sleep). The failure-transition to sleep is represented as recoverWith in our retry code. And the following transition from sleep to A(n+1) is represented by flatMap. But really, each An step is the same and we could simplify our state-machine to be:

Going Exponential

If we already have retries with a constant back-off, why do we need exponential back-off?

A constant back-off is a good start, but it can be hard to pick a good constant. You want retries to be fast (especially in the case where failures are intermittent) but you also don’t want to waste resources. Using a backoff that grows exponentially allows you to meet both goals.

We can see with a simple diagram how the time grows with each retry:

Continuing this pattern, we would get sleep times of 8, 16, 32, etc. Where you start your timing and how fast it grows is all up to you. So let’s see what it looks like to add exponential back-off to our code:

The main difference in this code is that we’ve added some parameters to define the scaling factor (x in n^x), the initial value (n in n^x), and a “current” value which is the previously calculated wait time. We use the fact that we can capture these values in a partial function (via recoverWith) to create a recursion of sorts.

Running in the REPL, we can see that it works:

scala> networkRequestWithRetries()(as)
retrying after 1 ms
retrying after 2 ms
retrying after 3 ms
retrying after 5 ms
retrying after 8 ms
retrying after 12 ms
retrying after 18 ms
retrying after 27 ms
retrying after 41 ms
retrying after 62 ms
retrying after 93 ms
retrying after 140 ms

Final Thoughts

Using composition with Future, it is easy to add additional behaviors and build complex state-machines that provide more flexible and powerful APIs and abstractions. Exponential back-offs is just one example.

Watch out for future articles covering other Future-based patterns.

Remote Developer. Writer. Kombucha Brewer. Owner of Corgis. https://medium.com/@john.m.murray786