AWS Application Load Balancer algorithms

Should I switch to use the Least Outstanding Requests algorithm?

Published in

DAZN Engineering

4 min readApr 1, 2020

Until November 2019, AWS Application Load Balancers (ALBs) only supported the round-robin algorithm. Now, we can use the Least Outstanding Requests (LOR) algorithm, but is it worth switching to use it?

The short answer is yes. But why?

What is a balancing algorithm?

All load balancers support some balancing algorithms. These algorithms define which target handles each request. They’re often ignored, but choosing the correct balancing algorithm can have a big impact on your application performance. The most common algorithms are round-robin and least connections, known as Least Outstanding Requests by AWS. Let’s take a look at how each of these work.

Round-Robin

The round-robin algorithm is the simplest. It cycles through your targets in order, so each target should receive an equal share of requests. Due to its simplicity, all load balancers support the round-robin algorithm.

Below is a diagram showing how the round-robin algorithm directs requests. It shows the service scaling up from 5 tasks to 6, after which the new task starts receiving an equal share of the traffic — 1/6th.

Requests are sent to each target sequentially, in a loop — Diagram showing which task receives each request when using the round-robin algorithm

Round-robin pros + cons

✅ Simple and easy to understand
✅ Each target receives an even split of traffic
❌ Targets need to be the same size and have similar performance
❌ Requests should all be similar in load + latency

The last point, that requests should be similar in load + latency, is the most important problem. APIs usually allow many request methods, each of which will result in a different amount of load. For example, a POST request will usually take much more processing than a GET request.

Looking at the diagram above, Task 1 could still be processing the previous request by the time it receives another one. Task 2 might be running on slower hardware. When using cloud resources, it’s impossible to guarantee the same performance for each instance. Even worse, if we’re using a spot fleet, we’ll have different instance sizes running, each of which with varying performance and capacity.

Least Outstanding Requests

The Least Outstanding Requests (LOR) algorithm aims to solve these issues. Instead of cycling through your targets, it’ll select the target with the lowest number of requests waiting for responses. NGINX refers to this algorithm as least_conn.

This diagram shows how the LOR algorithm directs requests. We start with some outstanding requests and, for simplicity, imagine that no requests are finishing — otherwise the number of outstanding requests (OR) would increase and decrease.

Requests are sent to the target with the least outstanding requests at that time — Diagram showing which task receives each request when using the Least Outstanding Requests algorithm

When Task 6 joins the target group, it receives all new requests until the cluster is balanced.

Re-balancing the cluster after scaling is essential for WebSocket-based services, like Pubby, our high-scale WebSockets solution. For more information about scaling WebSockets services using ALBs, read the Load testing section of that article.

For HTTP-based services, the LOR algorithm can still have a huge impact on performance. Let’s run a load test on each of these algorithms to see what the difference is.

Load testing

To see the performance difference when using each of these algorithms, we’re going to run a load test on a simple Node.js application using Artillery.

Our test service will have 2 routes:

GET /test— Immediately returns success
POST /test — Runs heavy computation for between 0–200ms (randomised) before returning success

Here’s our Artillery configuration:

Artillery configuration to run a 120s load test, hitting the service with 10 runs per second, each making 3 GET requests and a single POST request

For reference, here’s our application code:

Hacky JS implementation to respond to GET requests immediately and block for 0–200ms for POST requests

The results are in favour of the LOR algorithm, seeing a 514% reduction in P99 latency when using the LOR algorithm.

So, with a simple configuration change, we can reduce our P99 latency by almost two seconds. The results are so extreme that we re-ran the load test a few times, but every run returned similar results.

I need this, now!

In Terraform, the change is just a single line in the aws_lb_target_group. Simply set load_balancing_algorithm_type = "least_outstanding_requests".

Hang on, are there any downsides?

The main risk when using the LOR algorithm is causing a flood of requests when a new target is added to the target group. In the LOR diagram above, Task 6 received 5 requests in quick succession, which might put it under high load. This is a risk when running WebSocket services, as when we’re receiving thousands of connections per second, we don’t want all new connections to suddenly be sent to a single task. HTTP services shouldn’t be as badly impacted, but it’s still worth considering.

ALBs support slow start, which instructs the ALB to slowly ramp up the number of requests a new target will serve. Annoyingly, you can’t use both LOR and slow start together. The only way to avoid this issue is to ensure our autoscaling policy will launch at least a few tasks during each scale-up action.

The other potential issue is that a failing target will often respond quicker than a healthy one. For example, your service might immediately respond with a 500 error if it’s not connected to the database. In this situation, the failing target will receive a higher proportion of requests, causing a much larger incident. So, it’s important to ensure that health checks are quick to react to a failing target. This would still result in at least 10 seconds of failed requests, so it might also be worth introducing some artificial error latency. Hopefully, AWS will improve their health checks so we can divert traffic away from failing instances much faster.

In summary, the Least Outstanding Requests algorithm will usually result in a reduction in latency. It helps with balancing resource utilisation across your cluster, allowing the use of spot fleets.

Most services should switch to use LOR as soon as possible.