5 Ways of Handling Backpressure in Distributed Systems

Drew Jaja
5 min readNov 14, 2023

--

Backpressure is the resistance of flow within a system. It is the resistance of input flowing to output.

Backpressure is inherent in all systems. There is a resistance of water flow within a plumbing system depending on the shape and thickness of the pipes. In an electrical system, current is the flow of electrons within a conductor. Current is restricted by the resistance within the conductor shown by Ohm’s Law.

I (current) = V (voltage) / R (resistance)

Within a distributed system, the flow of data is restricted by the rate of requests or data each dependent service can process.

Issues can arise when an input to a system component is larger than it can handle. In a plumbing system, the pipes can leak or break if the water pressure is too high. In an electrical system, the wires can overheat or catch on fire if there is too much current. In a distributed system, services can degrade or fail if there are too many requests.

Handling backpressure is an important step to improving performance and fault tolerance in a distributed system.

Let’s say we have a simple system with 2 services. Service A calls Service B. Service A can support 200 req/s whereas Service B can only support 50 req/s.

Service A (can handle 200 req/s) calling Service B (can handle 50 req/s)

The system is restricted to its slowest service, which is called its bottleneck. So this system can optimally process a maximum of 50 req/s.

If service A sent 200 req/s to service B it would be at a deficit of 150 req/s and its response times would quickly degrade and the service could fail and not process any further requests.

Handling Backpressure

There are 5 approaches to handling backpressure in a distributed system.

  1. Slow down the producer
  2. Rate limit or circuit break
  3. Buffer requests
  4. Scale out services
  5. Optimize services

Each approach has their trade offs and they will need to be evaluated depending on the system’s requirements.

Don’t try to find the best design in software architecture; instead, strive for the least worst combination of trade-offs — Neal Ford, Software Architecture: The Hard Parts

Slow down the producer

Reduce the rate of requests from Service A to 50 req/s

Slow down the producer to ensure that it only sends requests at a rate that downstream services can handle. This will ensure that downstream services are not overloaded and perform optimally.

An example of slowing down the producer can be seen in TCP Flow Control. TCP Flow Control ensures that the sender transmits data at a rate that the receiver can handle.

Slowing down the producer may not be possible for all systems. If service A was an API endpoint that received requests from a website, it’s hard to tell users to slow down without affecting usability.

Rate limit or circuit break

Add a rate limiter or circuit breaker between the services

In many situations, it’s not possible to slow down the producer. To prevent Service B from being overloaded use a rate limiter or circuit breaker.

In an electrical system, a circuit breaker is tripped when too much is current flowing through a cable. This stops current flowing through the system, which prevents cables from overheating or catching on fire. In a distributed system a circuit breaker stops or slows the flow of traffic from one service to another based on a configurable condition. This could be based on the number of failures, concurrent requests or pending requests.

With a rate limiter, requests after a certain threshold, in this case 50 reqs/sec, would be denied with a 429 HTTP (Too Many Requests) response status code.

Buffer the requests

Buffer requests with a queue

Buffering requests with queue allows Service B to handle requests when it has capacity to process them. This prevents Service B from being overloaded, as it controls the rate of requests to handle.

The queue itself will have its own backpressure and can only support so many requests per second, so you will need to take this into account when enqueuing messages.

Queuing Theory states

As utilization approaches 100%, processing times approach infinity.

If the rate of messages enqueued is always higher than the rate of messages dequeued then there will be an accumulation of messages in the queue and it will eventually run out of memory. Using queues assumes that Service B will have time to eventually catch up with requests from Service A.

Scale out services

Horizontally scale services to handle more requests

Scaling out works well with queues as you can add more consumers to the queue to process messages at the rate they are enqueued.

Adding 4 instances of Service B allows 200 req/s to be handled.

Service B can be auto-scaled based on queue depth, the number of messages in the queue yet to be processed, or CPU utilization.

Optimize services

Optimize services to handle a higher rate of requests

Another way of handling backpressure is by optimizing your individual services to handle a higher rate of requests. This could be by either optimizing the existing code base or by re-writing it in a lower level language.

Summary

Backpressure is inherent in all systems. Handling backpressure correctly ensures that your system is performant and fault tolerant. There are different ways of handling backpressure in distributed systems. The approach you take will depend on your system’s constraints and requirements.

--

--