It’s all about engineering a load balancer: Types, Configuration & Algorithms

Siddharth Gangwar
7 min readApr 13, 2022

A load balancer is a reverse proxy. It presents a virtual IP address (VIP) representing the application to the client. The client connects to the VIP and the load balancer decides through its algorithms to send the connection to a specific application instance on a server.

Gist and flow of this article:

  • Types of load balancers (NLB & ALB)
  • Configuration of load balancers (active/active & active/passive)
  • Algorithms for load balancers (RR, Least , etc)

Types of load balancer

There are just two types of load balancers:

  1. Network Load Balancer: Layer 4 (TLS/TCP/UDP traffic), Static IPs.
  2. Application Load Balancer: Layer 7 (HTTP/HTTPS traffic), Flexible.

Network Load Balancer

The Network Load Balancer (NLB) feature distributes traffic across several servers by using the TCP/IP networking protocol. By combining two or more computers that are running applications into a single virtual cluster, NLB provides reliability and performance for web servers and other mission-critical servers.

The servers in an NLB cluster are called hosts, and each host runs a separate copy of the server applications. NLB distributes incoming client requests across the hosts in the cluster along with preserving all the headers and IP addresses from the source. You can configure the load that is to be handled by each host. You can also add hosts dynamically to the cluster to handle the increased load.

Application Load Balancer

An Application Load Balancer (ALB) only works at layer 7 (HTTP). It has a wide range of routing rules for incoming requests based on the hostname, path, query-string parameter, HTTP method, HTTP headers, source IP, or port number.

A very useful feature of ALB is that it can be configured to return a fixed response or a redirection. In short, helps in content-based routing. Also very importantly, ALB supports HTTP/2 and WebSockets.

Configurations adopted by load balancers

There are two configuration which could be finalised and setup:

  1. Active/Active configuration
  2. Active/Passive configuration

Active/Active configuration

An active-active cluster is typically made up of at least two nodes, both actively running the same kind of service simultaneously. The main purpose of an active-active cluster is to achieve load balancing. Load balancing distributes workloads across all nodes to prevent any single node from getting overloaded. Because there are more nodes available to serve, there will also be a marked improvement in throughput and response times.

Active/Passive configuration

An active-passive cluster also consists of at least two nodes. In the case of two nodes, for example, if the first node is already active, the second node must be passive or on standby.

The passive (failover) server serves as a backup that’s ready to take over as soon as the active (primary) server gets disconnected or is unable to serve, an active-passive failover for when a node fails.

When clients connect to a two-node cluster in an active-passive configuration, they only connect to one server. In other words, all clients will connect to the same server. Like in the active-active configuration, the two servers must have the same settings (ex. redundant).

Algorithms used by load balancers

Round robin

How about balancing load to all servers given cycling among them after a few intervals of time or based on request. By this. we can achieve balanced requests among all the servers and boom. We are ready with our load balancer.

Will this work in all cases — lets think …

What if the server were having different configurations and hardware allocation. Then, in this case, the performance of servers would have been different and round-robin still would work but some services will be underutilized and some overutilized.

For example, if we have 5 servers among which 1 is Raspberry PI (over-utilized) and among them, 1 is Mac Studio(underutilized), let’s consider the rest to be basic ubuntu machines with 16GB of RAM.

Can we distribute the requests load as per system configuration?

Weighted Round Robin

Let’s assign some weight to the round-robin scheduling algorithm. Now, let’s consider the previous example,

Mac Studio weights (x1) : 3
Ubuntu Server with 16 Gb of RAM weights (x3): 2
Raspberry PI weights (x1): 1

We can add these weights to the previous existing round-robin scheduling algorithm. So, the load is distributed evenly amongst the servers as per the performance of the servers.

Wow, we did it… let’t ponder if there exist any case where this strategy can fail?

Well, the issue here could be of handling multiple requests. What happens when there are different requests with different response times. Let’s again consider the previous example.

If a CPU-intensive task is constantly being assigned to Raspberry PI and just some simple basic calculation of fetching some data from DB is done by Powerful Mac Studio.

In this case, our algorithm is unaware of where to publish the request for a better response time. Let’s solve this with another approach.

Least connection with weighted round robin

The weighted least connections algorithm maintains a weighted list of application servers with their number of active connections. The service forwards a new connection to a server based on the following combination. Its proportion to the weight or preference. Its number of active connections.

This approach seems to work well, since we are engineering a load balancer then we need to check for each possibility.

BINGOOOO.. even this approach has issues. if you were able to find, you are a knowledgeable developer, or else you need to work hard.

What if some request had to perform a task in the background by returning a response instantly. There could be many reasons for background tasks such as usage of message broker (Kafka, rabitMQ) or like AWS where requests happen to be in the background and once completed. You are intimated via email.

Therefore we cannot judge how long the request will take to respond. Let’s randomness decide where to place the request instead of least connection.

Best way is to leave this to randomness.

Well, there is one more problem with all the approaches listed above. If we have a user continuously utilizing our servers, then he should always contact the same server as a different server may have a different copy of data and this also increases data inconsistency. This is why on some sites it shows that published content may take some time to replicate changes on the webpage because it needs to copy the data onto all the servers.

So for every problem out there in engineering exist a solution. In this case, we can generate a hash by using IP or Browser Fingerprint or Using some address or anything as per requirements. By doing this, we can ensure that the request is severed by the same server. Each time client sends a request it generates a hash and checks for the server which is associated with that hash.

What if you need to add a new server ???

What about the hashes that you have bind to the servers ???

Do we need to rehash everything ???

How the old request will know the new hash of the server ???

To rescue from this problem David Karger from MIT Open Learning came up with the concept of consistent hashing.

To read more about consistent hashing follow this link, beautifully explained by Toptal everything about consistent hashing.

Do clap if you found this article helpful. Please don’t forget to follow if you want to get more such articles on engineering.

--

--

Siddharth Gangwar

I'm a problem solver at heart. Whether the challenge is big or small, I'm passionate about finding efficient solutions to any type of problem.