Redis Concepts and their correlation with AWS ElastiCache - Part 1

Sonam Shenoy
10 min readAug 21, 2023

--

Introduction

Redis is a well known data store that owes its popularity to its ability to store and serve all data from memory, as opposed to a normal database that stores data in and serves it from the disk. The consequent faster reads and writes have resulted in Redis not only being used as a caching mechanism in front of a database today, but as a primary database too.

When we speak of Redis as a “cache”, we are not speaking in the context of the widely used terminology of “cache”, that generally refers to the data store layer embedded directly on the CPU chip of the same system — like the L1, L2 or L3 cache. These multi-level cache memory systems are positioned in front of the main memory, unlike Redis, that leverages the entire main memory of an instance to serve as a separate data store server in itself.

In this blog, I’ll be covering several interesting technical aspects of Redis and how AWS ElastiCache provides a managed platform for Redis.

Instead of randomly jumping on different Redis concepts, let’s split the entire domain of Redis that I would be discussing in this blog series into 5 different components that pertain to Redis. They can be given an acronym DPBDD (doesn’t sound good; tried to come up with a better one, but failed):

a. Deployment Architectures

b. Data Persistence

c. Data Backup

d. Directives

e. Data Structures

I’ll be discussing “Section a. Deployment Architectures” thoroughly in this blog, “Section b. Data Persistence” in Part 2 and cover the remaining sections in Part 3.

There is another interesting characteristic about Redis other than the 5 mentioned above, which has not been included here. And that is — Redis is single threaded. Yes! A single threaded engine, and yet a wonderful performance. How? Generally, when you hear of a system being single-threaded and still giving great performance, this is due to a very popular paradigm in the tech world — the reactor pattern, a.k.a. the event driven architecture. I will try to publish another blog on this interesting Design Pattern soon.

The way we will touch upon both Redis and AWS ElastiCache for Redis in these blogs are by - first explaining the features and concepts pertaining to Redis alone. And follow this with what ElastiCache has to offer for that very aspect.

This will also help clear misunderstandings, if any, with respect to ElastiCache while explaining at the same time what exactly it is.

So, let’s get going!

Part A: Deployment Architectures

Let’s begin with the very first aspect of Redis. Though normally I wouldn’t want to cover this concept in the very beginning, this section will help introduce readers to ElastiCache directly. This way, for each of the following sections that involve core Redis concepts, we can correspond the learnings with ElastiCache.

Redis supports certain deployment architectures that ensures its high availability. Each architecture has its own pros and cons. These architectures have been very well elaborated in this blog and I suggest going through it thoroughly. I’ll be summarizing it here:

There are primarily 4 types of Redis deployments:

1. Single Redis instance — the simplest one

The case where you have a single poor node running the Redis engine and serving all the reads and writes. All data is stored in the memory of this single instance. It would be wonderful if this one instance goes down — all your data goes down along with it. Also, performance-wise, it will be limited by the memory of that instance to the data it can hold, and by its resources to the number of requests it can serve. Naturally, there is nothing more to talk about this.

2. Redis High Availability a.k.a. Redis HA

We now add replicas to the above poor single node for high availability — and that node is now known as the primary node. These replicas or secondary nodes will be in sync with the primary node. Writes go to only the primary, while all the nodes can serve reads. If the primary instance fails, we can failover to one of the secondary nodes, which would then take up the role of a primary node.

3. Redis Sentinel

Although replicas were added in the above deployment to avoid a single point of failure, it is practically useless, if we have no way to monitor when any one of those instances went down. Thus, we introduce a new system — the sentinel. The sentinel constantly monitors if all the instances are healthy. If that’s not the case, it takes up the role of spinning up a new one. This is easy in the case of a reader replica going down. But if the writer/primary instance goes down, one of the read replicas is promoted to primary once all the sentinel processes reach a quorum (which, in very simple terms, means that there is consent from majority of the sentinels as to the action to be taken, depending on the number of sentinels configured). As you might have realised, these Sentinel processes do not deal with anything specific to caching, they just serve as a helper in this HA deployment configuration.

4. Redis Cluster

In the above 3 deployments, we had just one writer instance. Which means we could only perform vertical scaling if we wanted to serve a higher write TPS or store more keys. This is unsuitable in the longer term. Horizontal scaling is always recommended over vertical. After all, how much memory at max can be built into an instance?! And this is exactly what a Redis cluster solves. We can now write to and split data amongst multiple instances. And how is this done? Which instance to write a specific key to? If we write a key to a specific instance, we need to keep track of which instance the key was written to, so that we access that very instance when we want to read or update that key. We do not want to query all instances to check if that key was written to them.

The above points are taken care of, through a very interesting and well-known Redis concept known as shards. This is where a ‘distributed’ cache comes into picture. We specify that certain ranges of keys go to certain writer instances only. Each of these writer instances is called a “shard”. This way, whenever we issue a command to write a key, it is first hashed, and then goes through an algorithm that decides which writer instance the key is to be inserted into. That same algorithm is used while accessing/updating the key. Now the algorithm and more interesting facts on how the re-splitting of keys is done when more writer nodes are added (via a concept called hashslots) has again been explained in this blog. It is an interesting read you must go through. I will avoid repeating the same here.

Now that we have multiple writer instances, where each instance has a different set of keys, how do we take care of high availability, so that the failure of a single writer instance doesn’t result in the loss of the entire set of keys it was holding? By attaching read replicas to each of those writer instances. Thus, the read replicas corresponding to each of those writer instances will hold the same set of keys that the writer node does, and nothing more.

So, what is a shard? All the nodes that contain the same data. Thus, each writer instance and the set of read replicas corresponding to it form a shard.

How are the reader replicas kept in sync with the writer ones? Each write command is first executed on the writer instance, immediately after which an acknowledgement is sent to the client. And then this command is asynchronously sent to the reader replica for replication. This naturally means there is possibility of the command getting lost in case the writer goes down before the command could be passed on for replication to the reader replicas.

While the above method is how replication generally takes place, in cases where there is a disagreement between the reader and writer with respect to the replication lag or the very first time a reader replica is set up, the replication happens via the writer sending a compressed snapshot of its entire memory (through the .rdb file) to the reader, as the data to sync is huge. What about the write queries that are coming in at that time? Redis running on a single thread, can’t serve writes and take a snapshot at the same time. Thus, those incoming writes are buffered till the snapshotting completes. More of this in Section D: Data Backup.

Finally how is the failure or crashing of a node detected in a Redis cluster? Via ‘gossipping’. XD

What is AWS ElastiCache for Redis?

As seen above, Redis supports various architectures. Now these architectures can be managed by us. We can spin up a cloud instance, say an Amazon EC2 or a Google GCE and deploy a Redis engine on it, if we decide to go with the single node architecture. Or if we decide to go with the cluster deployment, we can take care of all the complexities involved, where-in we’ll have to deploy multiple EC2 nodes, split them into shards, allot further instances for each of these shards as replicas, write the algorithm to distribute keys evenly amongst the shards, set up an endpoint that uses the algorithm rightly and distributes traffic correctly amongst the reader and writer instances, set up sentinels, and so-forth.

A developer who just wants to make use of the features that Redis provides naturally wouldn’t want to deal with the complexities involved. To set up and manage such a distributed infrastructure is not a piece of cake. Thanks to AWS, we again have a solution to this. This is where ‘ElastiCache for Redis’ comes into the picture. ElastiCache not only helps set-up this distributed highly available Redis architecture, but provides many other extra functionalities too.

And thus having touched upon one of the Redis concepts — ”Deployment Architectures”, let’s see which of these Redis architectures Elasticache supports.

What does ElastiCache provide with respect to Part A: Deployment Architectures?

AWS ElastiCache offers 2 of the Redis deployments we discussed above. They call this the replication group:

1. Cluster-mode disabled

This corresponds to the ‘Redis Sentinel’ architecture. There is just one writer instance where all keys are written to. We can choose to have replicas for this writer instance or do without it.

2. Cluster-mode enabled

This corresponds to the ‘Redis Cluster’ architecture we spoke about above. The cache is split into ‘shards’. We just tell ElastiCache the number of shards and replicas per shard we wish to have. In AWS terms, a ‘shard’ is one of the writer instances + the replicas corresponding to that specific writer instance. Thus, all instances in one shard will hold the same data. When the primary (writer) node fails, the reader replica in that shard is promoted to primary. Of course, there is a limit on the number of shards and replicas per shard we can have in an AWS Redis cluster. This might vary from time to time, and it’s best to refer to the AWS documentation for the latest numbers. All the shards need to be in the same region, though the nodes in the shard can span across availability zones in the same region. ElastiCache then gives us just one ‘configuration endpoint’ which takes care of all the algorithm behind the scenes — which shard to insert or retrieve the key from, where to direct reads and writes, etc. Thus, developers needn’t worry about the algorithm and hash workings behind the scenes!

Note:- Whether AWS uses sentinels, gossiping and heartbeats or a custom implementation of their own to ensure HA in both the architectures specified above, is not known.

Thus, while creating an AWS ElastiCache for Redis, we can select one of these configurations. And according to the type chosen, we can select other settings such as the number of replicas and shards (only in case of cluster-mode enabled). Of course, there are more settings that are provided to us while creating the deployment, such as configuring backing up of data, HA, etc., and we’ll be touching upon these as we move through the blog and cover those aspects in the related topics.

More about AWS ElastiCache for Redis

Before moving ahead, let’s talk a bit more about ElastiCache for Redis.

As mentioned above, AWS Elasticache for Redis, is just a deployment, that helps abstract the complications involved in managing and looking after a Redis cluster. ElastiCache provides not just this, but millions of other integrations.

Some amazing features that elasticache provides are:

  • Infra Architecture-wise: Cluster and standalone support (that we already discussed above)
  • Redis-engine parameter groups: To use Redis engine directives
  • Configured Endpoints: An endpoint that takes care of distributing requests correctly amongst the reader/writer instances
  • Global Datastore and Availability zones: for high availability and disaster recovery
  • Backups: auto and manual back-up options of the instances; ability to spin up a cluster from an rdb snapshot
  • Auto-failover
  • Auto and manual scaling
  • Cloudformation and AWS CDK integration out-of-the-box: IAC (More on CloudFormation and CDK in another blog)
  • And much more…

I’ll be going through a few of them in this blog series.

We can select from a wide range of node types that AWS provides with respect to AWS ElastiCache — each varying based on the number of vCPUs, memory and other factors. However, does a higher number of vCPUs increase the performance of Redis? Of course, right? Higher number of CPUs means more concurrency. However, remember the point we had mentioned in the beginning? Redis is single-threaded. Thus, increasing the number of vCPUs has no effect on the performance. Yes, it does break the bank. Redis is not designed to benefit from multiple CPU cores. To increase performance, scale out by launching several instances.

Remember that ElastiCache is just an instance for a Redis cache engine to run on; like a box where Redis runs. It just hosts Redis. Redis on its own supports various architectures — like HA, Sentinel, Cluster, etc. ElastiCache is just something that helps manage the infra w.r.t. these various Redis architectures, offloading the infra set-up complexities (such as scaling, replication, etc) from us. The only difference is that, instead of normally connecting to a Redis server with its url and port number, when we use ElastiCache, we replace the url and port with the one that ElastiCache provides (default port used for Redis: 6379), where the Redis engine is now running. Other than this, nothing else changes, it’s just as if you’re directly working with Redis. Whether Redis is hosted on our own premises, on our private cloud or on a well-known public cloud such as AWS, we still need the same Redis client libraries such as Lettuce and Redisson to talk to Redis. It is just where we are hosting that differs. More on this in another section: Client side Libraries in Part 3.

Where to go next

We have thoroughly seen the different Deployment Architectures Redis provides. Head on to Part 2 to read more on another interesting concept — Data Persistence in Redis.

--

--