Consistent Hashing | Distributing the server load

Purnendu Kar
Geek Culture
Published in
3 min readApr 3, 2021

Consistent hashing is a special kind of hashing that uses a hash function that changes minimally as the range of hash functions changes.

We will start with hashing, then we will see distributed hashing and what are the problems it faces and how consistent hashing fixes those problems.

What is Hashing?

Hashing is the process of converting a given key value into another value by using a hash function. A hash function is used to generate the new value according to the mathematical algorithm. The result of the hash function is known as a hash value or hash.

Hashing

A good hash function is one that uses a one-way hashing algorithm or in other words, the hash cannot be converted back to the original key.

Distributed Hashing

This is a strategy to distribute the task in between given resources. Suppose you have M servers and N blocks in a circular queue. M servers are placed in a circular queue in M different points.

To place the server in M different point server id will be passed in the hash function to get the hash value. Perform the mod operation in the hash value and place the server at the block(hash%N=server_block).

Whenever a task comes its key value is passed in hash function and we get the hash value, now we perform the mod operation on this hash value (hash%N=i), after the modular operation, we get i value. So the task will be added in the i-th block. Now this task will be executed by the nearest server in the clockwise direction.

Distributed Hashing

But there's one problem with this approach. Suppose one of the servers goes down due to some reason, then what will happen?

If one of the servers goes down then as the method suggest if go for the next nearest server in a clockwise direction. This leads to putting more load in that server, which means the load is not distributed evenly.

To overcome this problem consistent hashing comes into play.

How Consistent Hashing solves the problem?

The strategy suggests adding virtual servers between them in the distributed hashing. Adding a virtual server doesn’t mean adding more server. The job of these virtual servers is to redirect the task to other servers.

Just like distributed hashing server id will be passed in hash function and the virtual server will be passed in different blocks. The difference here is that here we have multiple hash functions.

Suppose you have 2 hash functions, M servers will be placed in 2*M different points. Which reduce the risk of overloading a particular if one of the servers goes down.

Consistent Hashing

Rest are the same as mentioned in distributed hashing, the task is placed in a circular queue and the task will be executed by the nearest virtual server in the clockwise direction.

Note: Consistent Hashing is used in many places like Database Sharding (Scaling the database) or Load Balancer (Distributing the load in multiple Server)

--

--