Consistent hashing from first principles- Part 2

12 min readSep 26, 2022

(I’ve tried to minimize code clutter, so the code presented in the post is not self contained. If you want to run the code as you read, I suggest referencing this standalone notebook alongside the post.)

Part 1 recap

In part 1 we motivated consistent hashing, implemented a naive version, and quickly ran into a number of issues. In this post we will address some of those issues.

But before jumping in, let’s do a quick recap of part 1:

We’re interested in finding a way to allocate an infinite set of items (e.g. URLs) to a finite set of resources (e.g. servers)
Since we’re talking about an infinite set of items, it’s impossible to pre-populate this mapping.
One approach is to use a hash function that takes in a (unique) string representation of each input item and maps it to an integer. We can then take the modulo of that integer to figure out which resource we’re mapping the item to (assuming that our n resources are numbered 0 to n — 1).
This approach works great if you’re never going to change n. If you do, you’ll quickly realize that most items will be allocated to another resource as you add or remove resources.
If you just need a way to allocate items and there’s no cost to this churn, the modulo approach works perfectly. In some use cases such as caching, this churn can be very costly (e.g. increased latency). That’s why we explored ways to allocate more consistently.
We then introduced consistent hashing. The central idea is to (a) place resources and items (i.e. keys) on a conceptual ring, and (b) allocate keys to the closest resource.

Terminology

I will use the following terms in this post:

Key: an item that we want to allocate to a resource (e.g. URL).
Keyspace: the set of all possible keys (i.e. all possible strings). Obviously we can’t enumerate this set.
Ringspace: this is a term that I’ve made up, but “hash output space” is a mouthful. That’s exactly what “ringspace” is in our context though: all possible hash output values, which is neatly encapsulated by the conceptual ring.

Questions we encountered in part 1

How close did we get to our goal of minimizing the number of reallocations? I.e. is consistent hashing optimal?
We alluded to the fact that we can’t really control how much of the hash output space is covered by each server. What if we had servers with higher capacity and wanted to assign more of the space to them?
We don’t really have an easy way to target specific servers, what if we know a server is overloaded and need to reallocate some of its keys?
You might have noticed that the the complexity of key_lookup is O(n), where n is the number of servers. Can we do better? We’ll leave this one to part 3 (coming soon).

We’ll tackle 1 through 3 in this post.

1. Is consistent hashing optimal?

We can answer this question empirically. As we saw in part 1, consistent hashing works by randomly assigning (through the magic of hashing) servers and keys (e.g. URLs) to a ring. Since we’re dealing with randomness, we should expect to see some variance.

As we go from n to n + 1 servers, we expect 1 / (n + 1) keys to be allocated to the new server while other keys continue to have a “consistent” allocation (i.e. they continue to map to the same server as before). Therefore, the ideal consistent allocation is n / (n + 1).

To make this more concrete, let’s say we start with n = 1. Obviously, all keys map to the same server. If we add a second server, we want 1 / 2 of the keys to migrate to the new one. And, if we add a third server, we then want 1 / 3 of the keys to be handled by the third server, and so on.

Let’s consider three scenarios at the point of adding a second server:

(a) shows that server 1 is responsible for the entire ringspace before server 2 is added. (b) shows that if server 2 happens to be placed immediately after 1, most of the ringspace is allocated to it. Similarly, (d) shows that if server 2 is placed immediately before 1, then most of the keys are still in the purview of server 1. (c) is the only desirable scenario, where the ringspace is split between the two servers.

To explore these scenarios a bit more systematically, let’s introduce an experiment.

We’ll generate 10k random keys as a tiny proxy for the keyspace. Then, for a few values of n ranging between 1 and 100 we do the following:

Place n servers randomly on the ring (we’ll do this by picking random server names)
Allocate the 10k keys
Add another server (represented by another random string)
Compute the fraction of keys (i.e. the set of 10k) that map to the same server that they previously mapped to.

We repeat the experiment 100 times for each n to account for the randomness in placing servers on the ring.

Here’s a couple of helper functions:

We can use these to run the experiment:

Here’s the output:

The black line represents the optimal values while the red one represents the modulo approach.

What do you observe?

We have a wide range of outcomes for small values of n (i.e. high variance). This is not great, we may get unlucky and get a distribution that’s far from uniform.
The empirical expectation (I know, I know, box plots capture the median, and not the mean, but let’s overlook that detail 👋) is close to the theoretical one (i.e. the black line). This means that consistent hashing is optimal in expectation.
Consistent hashing can be worse than modulo (i.e. the red line) for small n.
Consistent hashing converges to the optimal with diminishing variance as n increases. This observations turns out to be the key to solving a lot of the problems we’ve discovered so far.

We just need more servers?

Seems like something magical happens as we add more servers. So, if you have hundreds of servers you can probably just use consistent hashing out of the box and call it a day.

What if you only have a few servers? Seems like we’ll be at the mercy of the probability gods.

Can we do better than hoping that we get lucky? 🙏🎲

Tokens

What if we create lots of virtual servers that represent our physical ones? We seem to have a good thing going with the ring idea, let’s just add a layer of abstraction.

Concretely, if we have n servers we can represent each server i ∈ {0, 1, …, n - 1} by kᵢ “tokens”. What consistent hashing sees is these tokens. What we care about are the physical servers. So if we carry out all consistent hashing operations in token land, and remember which server each token belongs to, we have everything that we need. We just need to make sure that kᵢ are sufficiently large.

Each server i is represented by a set of tokens `kᵢ`. We place these tokens and keys on the ring the usual way. We need to remember which physical server each token represents. Once a key is allocated to a token, we can just return the corresponding physical server.

Naive token implementation

To demonstrate how simple this extension is, let’s reuse ConsistentHashing to build it out:

This version assumes that we use a constant number of tokens k for all servers. All we’re doing is:

Appending the token index to the server ID with this format: {server_id}-{token_idx}.
Instead of using the servers directly to initialize ConsistentHashing, we’re using these newly minted virtual servers / tokens.
In key_lookup, we can throw out the token index and just return the server ID part.
The rest of the internals works exactly the same as before. There’s a bit of additional bookkeeping to ensure that we add and remove all tokens.

Does it work?

Let’s do a quick experiment for different value of n (number of servers) and k (number of tokens per server) using the helper function run_exp that we defined earlier:

Here’s the outcome:

Fraction of allocations to the same server as we go from `n` to `n+1` servers with different number of tokens `k`. The black line represents the optimal value for each n.

Variance dissipates very quickly as we increase k. Another win for solving a problem just by “introducing an extra level of indirection” 🏆.

How does it work?

As we increase k, each server owns more tiny slices of the ring and it’s increasingly less likely to get very far away from the optimal allocation.

To drive this point home, let’s say we start with 3 servers, each with k = 5 tokens. Subfigure (a) shows what this allocation could look like:

Subfigure (b) shows the addition of a 4th server- also with 5 tokens.

We observe that the 4th server is taking over small slices from the other three servers. Now, imagine scaling k to 1,000. I won’t attempt to slice a pie into 3,000 pieces in Google Docs (nor in real life), but I would imagine that the colors would blend together very quickly and we get close to a uniform distribution of slices across servers.

Not convinced yet?

Consider this experiment:

Drop n * k random tokens in the [0, 1] range by generating that many random floats in the same range. It doesn’t matter that we’re rescaled the range from the ringspace to [0, 1]- everything is exactly the same as before, except for a normalizing factor.
Sum up slices that belong to each server. We’ll consider the [0, 1] as a conceptual ring and use the same convention as before (a token owns the slice starting from its own value and up to, but not including, the value of the first token we hit if we traverse the ring clockwise). This is the overall fraction of the entire [0, 1] range that a server owns. The sum of these values should be 1 (make sure this makes sense to you).
Without loss of generality, consider the fraction that the first server owns. We’ll call this server1, which is guaranteed to be in [0, 1].

If we pick n = 5 , we expect server1 = 1 / 5 = 0.2. Let’s see how close we are as a function of k:

This figure depicts the outcome over 100 experiments for each value of k:

Fraction of the [0, 1] that is owned by server 1 as a function of `k`. The magenta line marks the expected value of 0.2.

This shows that the sum of slices tends towards uniform as k increases.

What if we wanted to control the relative allocation to different servers?

2. Allocation by capacity

Let’s say we have two servers A and B, server A has twice the capacity of server B, and therefore we want A to be responsible for 2 / 3 of the ringspace. Can we accomplish this by assigning twice as many tokens to A as we do for B?

Want server `A` with twice the capacity of server `B` to also take on twice the ringspace that `B` is responsible for.

Let’s just jump into an implementation of this idea. First, we’ll introduce a new class to encapsulate the idea of a token:

This class is very similar to what we did in the naive implementation earlier (and obviously the Server class from part 1). The difference is that keys are now a combination of the server ID and a token index.

We need to specify the number of tokens per server. We can pass this information as a dictionary that maps server IDs to the number of desired tokens for initialization:

Here’s the implementation:

Note the following :

_server_tokens keeps the number of tokens per server
_allocate_servers is almost identical to the implementation in part 1. The only differences are that (a) keys are now a tuple of server ID and token index (instead of just server IDs), and (b) we use objects from the newly defined ServerToken class as values.
key_lookup is also very similar. We’re just creating theServerToken class to map the input key to the ring. The key doesn’t really need a token index, but we can pick an arbitrary constant value (say 0). Basically, we’re pretending that the key is just another server to be able to compare its hash value to the servers (we used the same trick in part 1). If this is not obvious to you, think about it this way: the only information we need for allocating keys to servers is the placement of the key on the ring. We have defined a way to do that for (server ID, token index), and we’re just reusing it.
Once we compute the hash values distances, the rest is identical to the previous implementation. I refactored the rest in a separate function I’m calling _find_closest to stress this point.
Note that we’re throwing away the exact token that the key is mapping to in key_lookup. The way we’re defined the interface, this information is only internal to the class and is not interesting to the consumer of this method. I.e. the consumer just needs to know the server that the input key is allocated to.

OK, let’s see if all of this can help us to control the relative allocation of the ringspace. As you’ve probably guessed, I’m going to demonstrate this with an experiment. Here’s the setup- which builds on the two server example we started this section with:

We have servers A and B, where A has twice the capacity of B
We’ll assign a_tokens (an even integer that is ≥ 2) to A and a_tokens / 2 to B using ConsistentHashingWithTokens
We’ll compute the fraction of our favorite random 10k strings that map to A
For each value of a_tokens, we’ll repeat this 30 times to capture the variance. We’ll inject variance by representing A and B with different random strings (equivalently, you could accomplish this by varying the input seed).

The following figure shows the results for a_tokens values (2, 10, 50, 100):

The pattern repeats! Using more tokens reduces the variance and we’re converging to the desired 2 / 3 allocation for server A 💪.

Tokens pay dividends 💰

The token abstraction seems to solve two problems so far:

Increasing the number of tokens stabilizes the ringspace distribution. I.e. it’s very unlikely to get a distribution that’s far from uniform if we use thousands of tokens per server.
We can control the distribution if servers have different capacities. In general if we have n servers with k₁, k₂, …, kₙ tokens, we expect each server i to take on kᵢ / K of the ringspace, where K = k₁ + k₁ + ... + kₙ. E.g. if three servers have 100, 400, and 500 tokens, we’ll expect 10%, 40%, and 50% of the ringspace to be allocated to them respectively.

Here’s a demo of that example:

The fraction of URLs allocated to each server is:

c    0.5251
b    0.3801
a    0.0948

Pretty close 🎯!

3. Dynamic reallocation

Tokens seem to be a panacea! Turns out that we can even use them to dynamically reallocate the ringspace if servers start to get overwhelmed, without causing a sub-optimal reallocation of the rest of the ringspace (provided that k is sufficiently large).

To do this, we need a mechanism to update the number of tokens per server. Let’s call this method update_server:

Essentially, this method allows us to update the token_count. Here’s the implementation (see the full implementation in this notebook):

Let’s see this new functionality in action:

ch = ConsistentHashingWithTokens.equal_tokens(
    servers={'a', 'b', 'c'},
    tokens=100,
)

The equal_tokens factory class method creates a ConsistentHashingWithTokens object where all servers get the same number of provided tokens.

As expected, we see a close-to-uniform distribution across the three servers:

alloc(ch=ch)b    0.3421
a    0.3321
c    0.3258

If server a is overwhelmed, we can reduce the load:

ch.update_server(server_id='a', token_count=50)

Here’s the new distribution:

alloc(ch=ch)b    0.4206
c    0.3512
a    0.2282

We did reduce the load on server a on the fly 🎉!

No free lunch?

There’s a lot to love ❤️ about tokens:

Increasing the number of tokens stabilizes the ringspace distribution among servers.
We can get a uniform distribution of the ringspace by picking the same number of tokens across all servers.
We can engineer any desired initial distribution of the keys by picking number of tokens proportional to the capacity of each server.
We can dynamically reallocate the ringspace by changing the number of tokens.

What’s not to love? 💔

As we saw in part 1, the implementation we introduced is O(n), and we just made it O(nk) by using tokens- where k is the max number of tokens across all servers. This can be prohibitively expensive.

We’ll see if we can do better in part 3 (coming soon).

Code can be found in this notebook.