Consistent Hashing: An alternative for distributed hashing problems.

Raksha M P
4 min readDec 19, 2018

--

Distributed Hashing: In certain scenario’s, it might be necessary to split a hash table into several parts, maintained by different servers/computers, this solves the problem of memory limitations at a single computer/server and also enables in creating an arbitrary large hash table. This is called Distributed Hashing. Each key is mapped to a particular server like,

Hash = HashFunc(key)
Server/index = Hash mod(N)

where N is the number of servers.

The Distributed Hashing is a simple feasible solution until a server goes down/changes. In that case, the keys are needed to be redistributed to account for the change. But as we know, the key hosted in different servers are calculated based on N(no of servers). Hence as the N changes i.e if a server goes down or a new server is added, all the keys have to be rehashed corresponding to the new N. This leads to a major degradation in the performance and may possibly crash the existing running servers.

Consistent Hashing was introduced as an alternative/solution for the above problem.

Consistent Hashing: Consistent Hashing is a special kind of hashing wherein when a hash table has to be resized, only k/n keys need to be remapped on average, where k is the number of keys and n is the number of servers.
It is a distribution scheme that doesn’t depend on the number of server/size. Hence adding or removing the servers, the number of keys to be rehashed will be minimal.
Basically, Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring.
Let X be a hash table,

The hash table above is used as an output range for the hash function f. If we connect both ends of the table we end up with a ring.

The hash ring.

If A, B, C are the three servers. Using the same hash function f, we can map each server as a node(Virtual Node)to a point on the ring.

The nodes mapped on to the ring.

The interval between two nodes in the ring forms a partition called Continuum. And this space is used to map the keys to the servers. if we use the same function f over the key we will end up with a projection of that key in the ring. With this notion, we can define the server responsible for our key as the first node in the clockwise direction after the projection.

Hash ring with keys mapped.

If a key is mapped to a point after the last node and before the maximum value of function f, Since both ends are connected we have no problem and can traverse clockwise and find the responsible node.
Calculation of hashes to the degree is done as,
(hash)/(10^n)*360

Points to be noted in Consistent Hashing:

  • When we say only k/N keys need to be remapped, it means that when a node is removed from the hash ring, only the keys associated with that nodes are rehashed and remapped rather than remapping all the keys.
  • When you want to add a new node, we map the keys between the partition of the new node and the previous node in the hash ring to point to the new node and the keys will no longer be associated with their old node.
The Key1 is remapped to the new Node F(D) and its association with its previous node f(A) is removed.
  • Consistent Hashing enables Partition.When you have a consistent hash, everything looks like a partition and this partitioning makes the scaling for consistent hashing more predictable.
  • Consistent Hashing makes replicating data across several nodes easy.If we have more replicas’s of your data more likely is your data to survive server/machine crashes. Thus Replicating your data allows for a better availability and better fault tolerance.
  • Consistent Hashing is used to reduce the impact of partial system failures in large web applications wherein they deal with robust data and without having to incur system failures.
  • In a Hash Ring,there is a one-to-one mapping between a physical node(server) and node in the ring.The random placement of nodes on the ring may lead to non-uniform distribution of data between nodes.
  • To avoid Overloading a single node when another node leaves the ring and to distribute keys more evenly ,the system creates Virtual Nodes,which establishes a M-to-N mapping between physical nodes and virtual nodes in the ring. With Virtual Nodes,each physical node(server) becomes responsible for multiple partitions in the ring.
  • Complexity of a Consistent Hashing Algorithm is defined by O(N/k) where N is number of nodes and k is the number of keys.N/k basically refers to the number of items/keys per node.
    For any given key k,the access is linear and we can represent that as k=pN. Therefore the Complexity of Consistent Hashing becomes O(N/pN)=O(1) and is a constant which implies that the time for lookup for a key is independent of the number of servers and number of keys stored. This makes the Consistent Hashing more efficient.

--

--