Vertical vs. Horizontal Scaling
In a monolithic architecture, clients typically make requests to one single server. As the number of requests starts to scale, the single server does not have sufficient capacity to serve all the incoming requests.
Vertical scaling could be an option, where more CPU/RAM is added to the servers. This option could work for only so long before the hardware limitations are encountered.
In most cases, horizontal scaling, in which more servers are added, is usually a more scalable alternative.
Redirecting Requests With a Load Balancer
When we scale horizontally, the requests are directed to the load balancer instead of the servers directly.
The load balancer’s job is exactly what its name describes: its purpose is to balance the load on each server by distributing the requests as uniformly as possible.
Hash function and modulo (%)
All incoming requests, which will have a unique identifier (e.g. IP address), are assumed to be uniformly random.
Using a hash function, we are able to obtain an output value, after which we apply the modulo function to get the number that corresponds to the server that the load balancer should be directing the request to.
- hash(ipAddress) → output
- Output % number of servers -1 → server ID
It is important to use a good hash function to ensure that the output values are spread out across a range of values to improve the randomness. The modulo function then guarantees that the server ID is in the range of 0.(Number of servers -1.)
Visualizing the mapping
Let’s take a step back to visualize how we could possibly use an array as a data structure to map each request to the server.
In this naive example below, the index of the array maps directly to the server ID, but that might not necessarily be the case in production. Therefore, utilizing a data structure like an array would give us more flexibility in mapping the output to whichever server we like.
What Happens When We Add an Additional Server?
So far, we have assumed a fixed number of servers. However, since we opted for horizontal scaling, we should be able to add or remove servers as we wish.
Unfortunately, simply using a hash function and modulo will impact how other requests are being handled and redirected. Let’s go through the following example to understand the adverse impacts.
Assume that we have five servers, and after hashing the user’s IP address, we get a hash value of 88. If we take the value of (88 % 5), we get 3. In which case, the load balancer redirects the request to server 3.
However, if we decided to add an additional server, we would get a value of (88 % 6), which in turn redirects the request to server 4 instead.
Cost of change
This redirection may be seemingly trivial, but there are costs involved when servers are not stateless. Although HTTP is a stateless protocol, some servers may choose to store some user-related data in their cache for optimizations.
For instance, a server may choose to store a session log to remember the user to reduce the frequency of authentication.
Therefore, if a user with a particular IP address will be routed to a different server moving forward, the cache on the previous server needs to be invalidated. Since this change also similarly affects all other incoming requests, all the caches on the server need to be invalidated.
The cost of change here is exorbitant, especially when dealing with tens of thousands of servers at once. So, how can we reduce the impact on other servers while adding or removing servers?
The solution is to use consistent hashing. Let’s first try to visualize the concept in three steps.
Step 1: Map request to location on the ring
Now, instead of a regular array, let’s imagine a circular array. Similar to an array, each request would now map to a location on the hash ring.
Step 2: Map server to location on the ring
Since each server has an ID, we can apply the same hash and modulo function that was applied to the IP address to the server IDs. Let’s assume that the chosen hash function is optimal and we do not have collisions between the IP address and server ID.
Step 3: Move clockwise
Now that we have requests and servers mapped out on a ring, the final step is simple.
For each request, we simply find the nearest server to its right, in a clockwise fashion. For example, the incoming request that is mapped to index 7 is served by the server that is mapped to index 9.
How Does Consistent Hashing Minimize Impact on Other Servers?
Since requests are served by the immediate right-most server, at most one other server will be impacted by a change in the number of servers.
In the above example, a new server is added and it maps to index 95. The request that is mapped to index 88 is now served by the new server mapped to index 95, instead of the previous one that was mapped to index 99.
In which case, only the server mapped to index 99 needs to have its cache invalidated. Similarly, if a server is removed, the next server’s neighbor will take over the load, and the others will not be impacted.
In finding the nearest neighbor, the concept of consistent hashing avoids the expensive cost of change imposed on other servers and reduces the cost to a constant.
Requests are not uniformly random
In an ideal world, the requests are uniformly random and each server has a uniform load.
However, that is rarely the case in reality. For instance, there may be a higher number of requests coming from a particular region, which means that a server would have a higher load compared to the others.
To mitigate the load, we would need to put more servers between the indices mapped to the requests and the index of the closest server. Given a fixed number of servers, are we able to do that?
Use multiple hash functions
The short answer is yes. Recall that each hash function is different and returns a different output.
If we took the server ID and hashed it with three different hash functions, we would end up with three different outputs. If we mapped these three different values on the hash ring, they would be in different locations.
The idea of using multiple hash functions on the server ID creates virtual locations, or as we call them, virtual nodes, on the hash ring. As such, we have a more distributed position of servers on the ring, and this could help reduce the load on each server.
I’m really new to system design myself but, lately, I’ve taken an interest in understanding these high-level architectures. Through writing and explaining, I get a deeper understanding of the topic, and I hope that it helped you in understanding the concepts too.
If you made it all the way here, thank you for reading! I have also shared some of the resources that I used below.