Be a good client: jitter

Photo by Markus Spiske on Unsplash

Periodically triggered jobs are very common in modern applications, especially in the client-server model: uploading notes, analytics events or backing up data are good examples of timely operations that a developer might decide to trigger at a specific rate, for example once per hour, or at 3 a.m. when the user isn’t actively interacting with the device.

Now, handling the computational load can get tricky since the number of clients is a magnitude larger than servers.

It turns out that, as your application gets more successful, more clients will synchronize requests to your backend. That means your servers will see higher network traffic as these timely operations trigger (in a similar fashion to what’s known as the Thundering Herd problem), as such:

For your users, the higher load in traffic will mean higher response latencies, and potentially errors due to network timeouts. In a catastrophic scenario, a big enough number of clients or scheduling of call retries could even lead to denial of service (DDoS): what would be a small outage or a quick interruption of service will snowball by having clients enqueueing more an more retries that would themselves also fail.

Operationally, this would mean a waste of resources and will translate as an increase in cost to maintain your infrastructure. While most of the time the infrastructure will be idle, engineers will over-scale the services to be able to serve the high load of requests on the clock-tick moment.

Ideally, we want to distribute the load evenly across time, while still being able to serve the same number of requests, or in a visual way, this is how the distribution should look like:

We can do so by purposefully introducing jitter:

In practical terms, this means purposefully advancing/delaying the next call outside of what the expected clock tick would be. For a network request that usually would take an order of hundreds of milliseconds to be performed, we can introduce a delay of a few milliseconds to even out the load on the servers.

func sync() {
// instead of immediately performing the call
// we add jitter to distribute the network load
val jitter = random(0.5, 1.5)
wait(jitter.to_seconds)
perform_network_request()
}

Some libraries feature configurable jittering (see Netflix Hystrix, Resilience4J, Twitter’s Finagle, Google’s HttpClient, Android’s WorkManager), while others give full control (and responsibility) to the application developer (.NET Polly, OkHttp, NSURLConnection).

The key here is understanding the specific use-cases, and where the sweet spot is to balance incurring traffic in your server infrastructure and data consistency between client and server.

Jitter is especially important when associated to other resiliency techniques, like batching, call retries and exponential backoff. We’ll look further at these in a future blog post.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store