Horizontally scaling writes with Redis Clusters

Published in

Geek Culture

10 min readFeb 20, 2022

A Quick recap

Okay, so in a past post, I talked about how to scale Redis horizontally for higher read performance and better availability. If you haven’t read it yet, don’t worry I’d give you a quick recap(though I’d really like it if you read it :)). For those that have read it, skip to the next section.

So the idea is to add more Redis nodes(these would run similar to a standalone Redis instance) and enable replication between these new slave/replica nodes so they replicate all the data in the master nodes.

Redis would automatically ensure that data to the master(primary) nodes is replicated or copied in the replica nodes. This has a few advantages. For one, any Redis node that we would run would have limitations on the amount of data and traffic it can handle(for example, due to limited hardware) so if we can run multiple nodes, we can handle more traffic and data in total. The second advantage is if one of the Redis nodes were to crash, we already have other replica nodes running that can share the load so our users don’t get an error message.

There is a lot more detail and nuance to this, so if this seems like something you’d want to learn more about, read the full blog post here.

Why do we need it?

As with everything we discuss, there is a flaw in our current architecture. We have been constantly talking about improving read performance, but what about write performance? In my previous post, I justified this by saying that most apps are read-heavy than write-heavy, which is …true. But it is true for most apps, not for all apps.

For example, think of a real-time messaging app, like Whatsapp or Slack. In this app, users can send messages to other users and read older messages(let’s ignore channels and groups for now). You’d expect the read and write traffic to be roughly the same. Most messages are going to be read once(unless someone were to scroll to the top to find a certain message which doesn’t really happen that often) and would also be written once as well. It doesn’t really make sense to have the architecture to handle more read traffic but none to handle more write traffic.

Adding more replicas here might not be useful since we also need to scale up our write traffic, not just our read traffic. So, we need to think of some other way to increase our write throughput.

A Master-Master architecture

The intuitive thought here would be to allow the replicas to be able to write to the database as well. So, to clear up the terminology, let’s call these master nodes as well(since they are now writeable). Instead of the older master-replica architecture where we had a single node that can write and read(master) and multiple nodes that can only read(replicas), in our new architecture, we would have multiple master, or multiple nodes that can both read and write.

And on first thought, that does make sense but spend some more time thinking about it and you’d soon realize how quickly it would break.

For example, assume a service wanted to store a new message in our database, and it randomly chose a master node to write this new message. Let’s say it decides to write this new message on the node master1. A few hours later, the user decides to edit their message. If our service wants to edit this message from the node it wrote to, our system needs to remember which node the message was saved in. So, being the smart developers we are, we decided we would just store the mapping between each message ID and the node(for example, message ID 1423 goes to master1, message ID 5431 goes to master3). That should work, right?

Not quite. What would happen if we want to add a new node, maybe to handle more traffic? The new master node would start up completely empty which cannot be efficient, and worse still, what would happen if we want to remove a node that stores some messages and is mapped to them? Do we update all our mappings?

And how would we manage this new mappings data? Since we are adding a new mapping per message, it could end up becoming fairly large, so do we create a new Redis node for this mapping data? Do we have a replica of this new node?

Okay, this got really complex, really quickly. So, let’s take a step back and try to really understand the problem.

The nice thing about our master-replica architecture earlier was that there was a single obvious source of truth for any piece of data. I really want to emphasize the word obvious here. We always knew where to update/delete any key since there was only a single node where the key was getting written!

The problem with this new master-master architecture is that if you need to have multiple writeable databases, you need to do some work to find out where to write every piece of data or where any particular piece of data exists. To fix this, we need a method with which we can find the node where any particular message exists quickly, and finding this node should not become difficult as we scale and the number of messages stored in our system grows.

So, if our system gets any message, it should be able to quickly tell where this message exists/should exist. And this way of finding the node should still be performant once we have a lot of messages in our system(this is the part where our earlier mapping approach failed. It got really complex when we had a lot of messages already).

Let me propose a solution. What if we were to hash the message? What if we were to take some unique identification of the message(like an ID) and hash it and modulo it with the number of nodes we have. So, if we have 10 nodes, the message ID 6009 can go to 6009 % 10, or the 9th node and the message with the ID 3414 can go to the 4th node and so on.

This would tell us where to store any new message and give every message we have a specific Redis node. Whenever a service wants to find out where a message with a certain ID is stored, it can pass the ID into the hash function and just modulo the response with the total number of nodes. This way we don’t need to store where a particular message ID exists since we can compute this really quickly on the fly!

A more detailed architecture

Okay, before we jump into the implementation, let’s figure out a bit more detail about what we want to do.

Explaining the diagram a bit, we have multiple master Redis nodes. Each of these nodes handles some part of our entire dataset. So some of the messages are stored in master0, some in master1, and some in master2. We also have a method(more about this method later) of figuring out where we need to store every message. We also set up some replicas so that we can have high availability and a higher read performance per master.

This combination of master and replica nodes together is called a cluster. So, what we are essentially implementing is a single cluster of Redis nodes. Some of these nodes in our cluster are master nodes and others are replica nodes.

Implementation!

Finally, let’s jump into the implementation. As usual, for the impatient ones, all the source code is available here on Github.

I used docker compose to set up this architecture on my local machine. This is what my directory structure looks like-

Let’s walk through how it works.

I start by writing a Dockerfile that describes what every single Redis node should look like.

The way this works is that I get an argument from the docker build stage that describes which port to run Redis on(this would be useful later). I then install some OS libraries and download the default Redis configuration file. This configuration file is important because when I write my own custom file, I can include this default configuration and just override the properties I need(otherwise I’d have to write all the default configuration as well).

Then comes these particular set of lines

ENV REDIS_PORT $REDIS_PORTRUN envsubst < redis.conf > updated_redis.confRUN mv updated_redis.conf redis.conf

What this is essentially doing is setting the REDIS_PORT environment variable(which we got in the build step) into the configuration file using a Linux tool envsubst. Check this out if you want to learn more.

This is what my redis.conf looks like

This configuration just imports the default configuration we downloaded earlier, sets up some settings for protected mode(shouldn’t be used for production environments, but should be fine for our local debugging), and enables cluster mode. It also sets a cluster-config-file. This file is not created or read by us, but Redis manages this internally.

Some of you may notice that we haven’t done any work to tell Redis to set up the cluster or how many replicas we want per master or any details about the cluster apart from the fact that we want to have a cluster(since we set the cluster mode enabled). Don’t worry we will get to that soon.

For now, let’s take a quick look at our docker-compose.yml

This seems really long(and we haven’t discussed the cluster-setup service as well) but don’t worry it’s actually fairly straightforward.

We are just setting up six Redis nodes, giving each of them a build time variable REDIS_PORT, which is what our configuration was using internally. We are also using the host networking mode. This is because it is much easier to run Redis clusters with the host networking mode even though there are ways you could run it with bridge networking mode)

So, the above docker-compose would set up something like this

Notice we haven’t set up any replication or the cluster yet. Let’s do that now!

To set up the cluster, I created another Docker image that starts up after all the Redis nodes are up. This container once running would set up the cluster and exit. This is the cluster-setup service in our docker-compose.yml.

This is what its Dockerfile looks like

It first waits for 10 seconds for the other Redis nodes to become available. After that, it sets up the master nodes specifying no replicas to be created for the master nodes. It then individually adds replica nodes to each master node. The three masters are running on ports 6079, 6179, and 6279. Each master node has a replica and they are running on ports 6080, 6180, and 6280. Check out the ports in the diagram above to understand which node is replicating which one.

This container would finish up our architecture by setting up our cluster and replication within it.

And, that’s it! Let’s run it and experiment with the system a bit.

Run it!

Since we are using docker-compose, all you need to do is run docker-compose up and it should automatically set up the cluster for you. Once it is up and running. We can use the redis-cli(remember to use the cluster mode using the -c option of redis-cli) to connect to any of these nodes and GET and SET some keys. We can also run the CLUSTER NODES command to get details about our master and slave nodes and confirm if everything is working fine.

This is what it looks like

As you can see, we can connect and write to any of these nodes but internally Redis writes every key only to a certain node. Depending on the key, Redis automatically decides which master node should contain that key. It also knows which node to use when it wants to fetch a certain key. And we can also see that the data we are storing is also getting replicated.

Running CLUSTER NODES also gives us a nice output about all the master and replica nodes we have connected. Everything seems to be running smoothly!

I Lied

Ok, I have a confession to make. I lied when I said that we can use hashing as a way to distribute or shard data into different nodes. Well… I didn’t exactly lie but mislead you a bit to try to make things simpler. But now, once you have a grasp on what we talked about, let’s discuss the hashing step a bit more in-depth.

We can use the simple hashing method I described earlier but it comes with a lot of pitfalls. Discussing these would take another post by itself so check this out if you want to learn the pitfalls our simple approach would have and how we can improve it. Redis uses something similar, called hashslots. I really wanted to cover them in this post but it has gotten long already so I can’t talk about it now. But if you are really interested, I think this video explains it much better than I could have. And don’t worry, the rest of the stuff is the same, it’s just that the hashing aspect of it is much more nuanced than what I had described.

Conclusion

It was really fun exploring how to set up clustering on Redis but this felt a bit conceptual. I did set up a simple cluster but I may write up more on this topic with a lot more practical tone to it. I also spent some time learning about how companies like Box use Redis which can also be interesting articles for the future. Another interesting approach could be to find a simple enough use case, build a web app and perform some basic load testing on it. That would be a lot more helpful in understanding the real advantages Redis provides. But all that’s for another day!