Techniques for Load Balancing

5 min readAug 23, 2022

Let’s say you are managing a web service for bookings of a hotel chain that has hotels in many countries across many continents in the world. Your server is capable of handling 750 TPS (Transactions per second) which is sufficient for your normal day to day capacity. But in the summer in the west, in their holiday season, the number of hits you get to the booking server increases to a maximum of 2250 TPS, at which point it is 3x the capacity of the server you normally use for the booking service.

For this particular time that the demand is high, to keep up with the demand you can follow multiple methodologies to scale.

Scale Up
Scale Out / Horizontal Scaling

Scale Up means adding more resources to an existing system to increase its performance up to a state you need. If we go with the above example, if you follow this method, you should add more memory to the system, swap the CPU for a better performing one and increase the bandwidth of the server that you already have. But in this technique, you always hit a ceiling. For an example, the server board might only support this amount of maximum memory, which might not be enough for you, or the server board could be already having the maximum CPU it can accommodate without a board swap. These kinds of limitations make Scale Up technique not a sustainable technique for scaling.

The next method is Scale out which is also known as Horizontal Scaling. This technique includes adding more of such servers, such as more booking servers to serve the demand. In the above case if you add 3 more servers to the booking system to meet the demand, it is scaling out. But with scaling out, you should balance the load of hits across all the servers in a manner that no single server gets over loaded. You can obviously generate a random number out of the number of servers you have for each traffic and send it to them. But in this random technique, you cannot ensure that a server will not get overloaded because it says it is RANDOM!! To avoid this scenario and have sustainable traffic distributed across all servers, you can use multiple different Load Balancing techniques.

To do load balancing, all the traffic should first hit a load balancer. When configuring the load balancer or the reverse proxy, you should be conscious to have enough capacity in it to serve the maximum number of probable traffic because this can be a single point of failure for the whole system. It doesn’t matter how many servers you have configured to serve the surge in traffic, if the load balancer fails your system is useless for the time being.

There are two types of load balancers that you can use, Hardware or Software load balancers. The hardware load balancers could be reliable but is harder to quickly scale up or reconfigure and are more expensive. On the other hand, Software load balancers could provide you with more filtering and better scaling but could be sensitive to OS versions or a change in virtual appliance dependencies.

Now for the load balancing, you can use many different techniques apart from that random allocation that nobody would recommend using.

Round-Robin is a popular method of load balancing. It divides the traffic and sends to the next server as they come. Let’s say you get 100 requests in quick succession, and you have 4 servers, it would send the first request to the first server, next request to the next server and would continue to do this in rotation. If one server is temporarily down from the server pool, it will reroute the request that was bound to that server to the next available server and will ignore that server from the rotation until it is up and running again.

This is a cool way of load balancing, right? What if I say one of the 4 servers have higher capacity than the others in the pool or in a more broader scenario multiple servers have different capacities that they can handle. This is when the Weighted Round-Robin is useful. It assigns weights to the servers according to their capacity and distributes the traffic. For an example, out of the 4 servers you have, 2nd server has double the capacity of the others, and you get 10 requests. Since the second server has double capacity, it is assigned a weight of 2 compared to the weight of 1 assigned to others, Therefore the 10 requests would be routed to servers as 1,2,2,3,4,1,2,2,3,4 by the load balancer.

Another technique of load balancing you can use is Geo-Based Load balancing. This means distributing traffic across servers based on the location of the client to the servers that are located in different geographic regions of the world. If we go back to our original booking system example, you might get requests from Asian countries such as India, China, Sri Lanka and European Countries such as UK, Germany and NA countries such as USA and Canada. You have your servers in Asia, Europe and NA geographic regions. Now you can send traffic that is originated from Asia to Asian servers, Europe to Europe. This basically means you route the traffic to the closest available server.

In load balancing there is no need for you to be restricted to either one of the above techniques. You can use the load balancing techniques in combinations that is most suitable for your applications. In the booking applications you can scale to have multiple servers in the same geographic region. In such case you can use a main load balancer which balance the load across the regions using Geo-based load balancing and send the traffic to a regional load balancer where it would be using Round Robin or Weighted Round Robin techniques to balance that load across multiple servers located in that region.

Using these load balancing techniques, your servers can run on their capacity all the time and overloading can be avoided. Which in turn will make your operations run smoothly and the clients of your service will not experience any downtime or hassle.

You can find a simplified and detailed video of load balancing in the following link by Mr. Krishantha Dinesh, which I used as reference for this article.

(116) Technical Design Concepts Every Software Engineer MUST know | Load Balancers — YouTube

Techniques for Load Balancing

Written by Nipun Nadeeshana Liyanage