Redis resharding.

Antonaleksandrov
5 min readNov 26, 2019

--

I am a giant fan of Redis. Redis is a multifunction database that is mainly used for caching and queuing(as far as I know). It exposes a simple interface to work with for the most common programming languages. The CLI made for cluster administration is near perfect. What is missing then?

Well, here I should take a step back and explain a bit more. I was doing my internship in a company, which strongly relied on caching. Their first solution was to use MongoDB, as it’s quite fast and scales well. However, as the volume of data has grown, they needed to migrate to a faster and more reliable solution. Here, I come into play. I have been given an assignment to research different solutions and find the most suitable one. While researching, I have compared all the possible solutions I have found on the Net. In a week time, I have decided to try out Redis as a caching DB.

Why Redis?

As I have mentioned before, Redis has support for the most common languages. Using a simple interface, you can add a data with a key and retrieve it by the key. Most importantly, it stores data in RAM, which makes in lighting fast. The cherry on top was that I can create a Redis Cluster in one command if I had all instances up and running.

Redis Cluster

I think there were enough articles about Redis Cluster. Yet, I am going to give a brief intro to it anyway. Redis Cluster is at least 2 Redis instances running, connected via a TCP bus and a binary protocol, called the Redis Cluster Bus. It involves master-slave(replica) connection. Usually, you would run 6 Redis instances as follows.

Here we can see 3 masters and 3 slaves(replicas) connected via Redis gossip protocol(TCP bus and a binary protocol). The best thing, you can write to any master and according to the sharding concept, data will be distributed evenly across the cluster.

Now, what is sharding? Redis ensures that data will spread among all nodes by applying a simple formula to each key you want to save. It will calculate CRC16 of the given key modulo 16384 and the resulted value corresponds to a shard number. As every master node is responsible for a set of shards, the given key and value will be saved in the corresponding Redis instance.

The most common distribution as follows:

  • Master 1 contains hash slots from 0 to 5500.
  • Master 2 contains hash slots from 5501 to 11000.
  • Master 3 contains hash slots from 11001 to 16383.

Example:

We want to save value “foo” with key “bar”. Suppose, Redis has calculated the shard number as 1500(i did not do the actual calculation, it’s a fake number). Let’s say you already have connected to Master 2 and executing you write command on it. Using gossip protocol, Master 2 will send this data to a shard number it belongs — Master 1.

Putting it in simple words, Redis will always know a way to save your data correctly.

What is missing?

You can scale up your cluster easily. By adding new masters and slaves, you can expand your cluster. However, what happens to shards? Are they redistributed as you are adding new nodes?

Turns out, the answer is no. As the number of shards always fixed, they remain in their current positions.

Let’s say you want to add a new master and new slave to your cluster. You have successfully established a connection with the existing cluster and everything seems alright. As you start writing to the cluster, you will notice that the newest nodes are simply not used. Redis Cluster will not redistribute shards for you, you have to do it.

Unfortunately, Redis CLI does not provide one-command to do it. You can manually tell your cluster how many slots have to be moved.

In the case, you have a cluster with 6 nodes and you are a perfectionist just like me. You will do the following:

  1. Check the number of shards on each master.
  2. Calculate how many you want to send from each master. Because you are a perfectionist and want this nice equal distribution.
  3. Run redis-cli reshard host:port — cluster-from <arg> — cluster-to <arg> — cluster-slots <arg> 3 times(since you have 3 masters with distributed shards)
  4. Check your distribution

Well, it might not be an issue if you actually have 6 nodes, but what about 100 nodes or more?

You want to press one button and see “Done” in the end.

What is the alternative?

You can run your Redis Cluster on AWS, they provide automatic resharding upon adding a new node.

OR

You can try using my solution for this issue. During my internship(more than year ago), I have developed my CLI wrapper for Redis CLI. Some commands might not be interesting for most users.

The way it works as follows:

  1. You provide it a source node and give a command to reshard. It will be to used to perform resharding. Technically, you could provide any cluster node, but I would recommend a master, that is not directly used for writing.
  2. The tool will look for masters that do not have shards yet. You can connect multiple “empty” masters and replicas to it, the tool will find them all.
  3. It will calculate how many shards should be taken from each master in order to have an equal distribution of shards.
  4. Then it will perform resharding from each master and do checks after each iteration.
  5. Voilà, you have resharded cluster.

Down below you can find a visual representation of the way it works:

You can find my tool with more extensive documentation in my GitHub.

--

--