Scale Your Load Balancer Like Google

Published in

CodeX

11 min readOct 18, 2022

What is a load balancer

When you first launch a website, you almost have to worry about nothing. You boot up an EC2 instance or any other server, get its public IP address, map that address to your domain, and that’s it. You are done.
But what happens when the traffic starts increasing and you are getting thousands of requests every second? Your server will be overloaded with the incoming requests and eventually crash.

So what do you do then? This is where a load balancer comes to the rescue.

https://www.nginx.com/resources/glossary/load-balancing/

What we have done here is increased the number of servers that you have so that the increased traffic can be served without any problem and connect them to your load balancer. You then take the IP address of the server where load balancer is hosted and map it to the domain of your website. The request now comes to the load balancer and it evenly distribute the requests to the application server.

Neat, now you can add as many application servers as you require and keep serving ever increasing traffic to your website.
But wait a minute, your load balancer is also a server, it must have some upper limit on the number of requests it can process right? Exactly.

Now think about website like google.com or amazon.com which are serving 10s of millions of requests per day. No matter how good your hardware is, it can not keep up with this ever increasing number.

So what do you do? It’s very simple, you just add more load balancer and add another layer of load balancer in front of them. Same as you did while scaling the application servers. That’s it, thanks for reading the article.

Well, not exactly. The problem with this approach is that the front layer will again hit upper limit for the traffic it can serve.

This is when you need to start looking beyond a load balancer and that’s what we’ll discuss in this article.

How to do load balancing at scale

There are a few different approaches we will discuss here. They can be useful depending on your use-case and the traffic you’re serving.

1. DNS Round Robin

Round-robin DNS is a load balancing technique where the balancing is done by a DNS server rather than using a dedicated piece of load-balancing hardware.
Round-robin DNS can be used to balance the load across multiple load balancers where the DNS server hands out the IP address corresponding to a different LB each time, operating on a rotation.

We can see here that Google and Amazon use DNS round robin for their frontend load balancing.

In the above example, it could be that 209.85.203.113, 209.85.203.102,… are the public IPs of different load balancers.

Problems with DNS round robin

The round-robin method doesn’t always provide evenly distributed load balancing because of both DNS caching and client-side caching. If a user makes a DNS query to a particularly high-traffic recursive resolver for a particular website, that resolver will cache the website’s IP, potentially sending a heavy amount of traffic to that one IP.
Another drawback is that round-robin cannot be depended upon for site reliability; if one of the servers goes down, the DNS server will still keep that server’s IP in the round-robin rotation. So, if there are six servers and one is taken offline, one in six users will be denied service. In addition, round-robin DNS does not account for server load, transaction time, geographical distance, and other factors that traditional load balancing can be configured for.

2. GeoLocation based load balancing

Let’s suppose that your website is now famous in different parts of the world and is serving traffic for users in Europe, America, and Asia.
Your load balancer and your application servers are in India. This creates 2 major problems.
1. Round trip time:- As the servers are sitting in India, the request for users in America has to travel half the world and back.
2. Locale specific information:- As a product owner, you would want to show different websites to a user sitting in China than in the USA.

There is one simple solution to both of the problems. Geolocation-based DNS or GeoDNS. With this, DNS will now resolve different IP addresses for your website based on a user’s location. You can redirect the users sitting in China to the server closest to the location to ensure minimal round-trip time and also enable the servers to expose content specific to the geography.

But then again, how does it help with scaling your load balancers?
With this GeoDNS feature, you can now have different load balancers in different regions and scale each stack separately.

In the example above, for the same url example.com, a user sitting in France will always get an IP address of 192.0.0.2 while a user sitting in the US gets 192.0.0.1.

Amazon (at least internally) uses this kind of load balancing extensively. Almost all the micro-services i’ve worked on use this.

Problems with GeoLocation based load balancing

It’s almost impossible that you will come across this problem, but with this approach, you are still bound to the traffic your load balancer can serve in one region. We still face the problem that we started with initially, but now you will face it at a regional level and not at a global level.
Not all DNS providers provide GeoDNS. Amazon Route53 provides you with this capability, but CloudFare doesn’t.

3. Load balancing with ECMP + BGP routing

Before we try to understand how to scale our load balancing fleet, let us understand the ECMP and BGP protocols.

Equal Cost MultiPath
Typically, we think of an IP address as referencing a single physical machine, and routers as moving a packet to the next closest router to that machine. In the simplest case, where there’s always a single best next hop, routers pick that hop and forward all packets there until the destination is reached.

In reality, most networks are far more complicated. There is often more than a single path available between two machines, for example where multiple ISPs are available or even when two routers are joined together with more than one physical cable to increase capacity and provide redundancy. This is where Equal-Cost Multi-Path (ECMP) routing comes into play — rather than routers picking a single best next hop, where they have multiple hops with the same cost, they instead hash traffic so that connections are balanced across all available paths of equal cost.

ECMP is implemented by hashing each packet to determine a relatively consistent selection of one of the available paths. The hash function used here varies by device, based on the source and destination IP address as well as the source and destination port for TCP traffic. This means that multiple packets for the same ongoing TCP connection will typically traverse the same path, meaning that packets will arrive in the same order even when paths have different latencies. Notably, in this case, the paths can change without any disruption to connections because they will always end up at the same destination server, and at that point, the path they took is mostly irrelevant.

Border Gateway Protocol
Frankly speaking, BGP is a very vast topic and one of the building blocks for the internet as we know it today. This is the wikipedia definition.
For the purpose of this article, what we need to understand is that you can use BGP to broadcast your neighbouring routes. With BGP, you can also broadcast to the internet that you’re 1 hop away from an IP address.

The theory is well and good, but how do we exactly use it for our purpose?

I’ll take an elementary example first to understand the advantages both these protocols give.

Let’s assume we have reserved the IP address 1.2.3.4. We can boot up n load balancer hosts which will then broadcast to the internet using the BGP protocol that they’re 1 hop away from IP 1.2.3.4. This way, using the BGP protocol, we have essentially scaled our load balancer stack. At any time, you can add more servers to your load balancing fleet, broadcast that the server is 1 hop away from your reserved IP, and that’s it.

But this setup still has one problem. You have n load balancing servers, but how do you know that traffic among them is distributed equally or at-least uniformly? Consider a scenario when all your traffic is still being routed to just one server. This is where ECMP shines.
As we mentioned above, if a router is ECMP enabled, it will ensure that traffic is equally distributed among the nodes that have the same cost to reach the destination. In our example, all of our n load balancing servers are 1 hop away from 1.2.3.4, so the router will ensure that the requests are distributed among each server.

3.1. L7 ECMP + BGP load balancing

Now let’s see what the example taken above looks like from an architectural point of view.

Now, based on the above configuration, you can scale your load balancer fleet horizontally and have endless scaling. At any point of time when your load balancers are overloaded, you just add another LB and you’re done.
But this solution has one major problem, i.e., the stateless nature of routers.

Problems with this approach

As mentioned above, routers do not maintain the state of the connection. For the selection of the next hop with ECMP, they do hashing based on source and destination IP addresses. A problem arises when the topology between the router and load balancer is disturbed.

When adding or removing a load-balancer, the number of available routes for a destination changes. The hashing algorithm used by routers is not consistent, and flows are reshuffled among the available load-balancers, breaking existing connections.

Moreover, each router may choose its own routes and might also have different caching algorithms. When a router becomes unavailable, the second one may route the same flows differently.

If you need to handle long connections like file downloads, video streaming, or web-socket connections, this solution will definitely not work.

These problems can be minimised if all the routers use Consistent Hashing. Consistent hashing will ensure that the reshuffling of connections is minimised in case a new load balancer is added, and as all routers use the same hashing algorithm, even if a router is added or removed, packets will be directed to the same LB.

3.2 L4-L7 split with ECMP + BGP

Now, after all the approaches and efforts, we are still not able to build the ideal solution.
One thing to keep in mind is that the problem with the above approach was the routers being stateless or even dumb. Any change in the topology would lead to a shuffle in all connections.
We have already discussed how consistent hashing for ECMP can solve this problem, but ensuring all routers are setup in the same way isn’t possible unless you are front-loading your whole network with your custom routers.

So to solve the above problem, if we introduce a layer which can manage the connections between our L7 load balancer fleet and routers, we should be good.

This is where L4-L7 split comes into picture

With another layer of servers to manage connections, we have full control over connections that are established with our L7 LB, which was not there with the routers.

Let’s see how the above mentioned problems can be solved using L4 load balancers.

Addition/Removal of L4 LB or router
These L4 LBs are fully in our control and we can ensure which algorithm we want to use for hashing purposes. Irrespective of a router or even an L4 server getting removed, all the running servers will use the same hashing algorithm and will direct the packets to the same L7 LB for a given Source and Destination IP pair.

Addition/Removal of an L7 Load Balancer
Even when we have consistent hashing enabled, when you add or remove a server in the last tier, there will be a change in the output of the hash function because any hashing will take into account the total number of servers.
So, how do we solve this problem? This is where the L4 tier shines.
Now that you have full control over these servers. You can ensure that already established connections still route to the same LB by keeping a connection table. You can use IPVS to provide both load balancing and connection management for your L4 load balancers.

Google has created Maglev and Github has created GLB which use same principle for their load balancing solutions.

Problems with above approach

Complexity:- Unless absolutely required, why would you go for this solution? There are so many layers with such a complex setup, you would be better off using the LB provided by AWS or Google cloud.
Simultaneous change in both L4 and L7 LB:- Assuming a new server was added on both layers, we now have an L4 server with no connection table while the hash function returns different output as the number of L7 hosts has increased. In this case, all connections coming from the new L4 LB will be forwarded to a different L7 LB.

So this is it. In this article we have discussed 4 ways of scaling a load balancer, which, let’s be honest, will not be required for most of us, including me. But hopefully you have a few more answers than before.

Moreover, I would like to have more understanding of these technologies and would love to work on their implementation. If anyone is interested, please reach out :)

Please follow if you liked this article and would like me to write more.
Reach out to me at Linkedin for more.

References

http://wtarreau.blogspot.com/2006/11/making-applications-scalable-with-load.html
https://www.bizety.com/2017/01/17/facebook-billion-user-load-balancing/
https://github.blog/2018-08-08-glb-director-open-source-load-balancer/
https://vincent.bernat.ch/en/blog/2018-multi-tier-loadbalancer
https://serverfault.com/questions/268597/what-is-a-typical-method-to-scale-out-a-software-load-balancer
https://blog.cloudflare.com/high-availability-load-balancers-with-maglev/