Redis Concepts and their correlation with AWS ElastiCache - Part 3

11 min readOct 1, 2023

This blog is in continuation to the Part 2 of the Blog Series on Redis Concepts and their correlation with AWS ElastiCache.

Part C: Data Backup

Say you have a single node Redis instance without any replica, and it goes down due to some reason. Your database will go for a toss with millions of requests hitting it at an unprecedented rate. Or consider migrating our cluster to another deployment, say from our on-premises infra to the cloud. (Remember that this discussion is not related to AWS or Elasticache and is solely exclusive to Redis.) When we bring up the new cluster, it wouldn’t be ideal to start with a cold cache, as it would lead to innumerable cache misses. (A cold cache is one that starts with no data, i.e. an empty memory). These requests which suffer a cache miss are re-directed to the database and the cache is then gradually populated with data. This naturally means that in the initial phase, the database will be bombarded with requests. What if we had the data in our cache already ready, to begin with? This is possible — through backup.

Redis provides 2 mechanisms to deal with this — AOF and RDB.

1. AOF or the Append-Only File

Every command that is issued to the cache, from the beginning, is logged into a file known as the Append-only File. This log file resides on the disk of the Redis instance. Thus, when we spin up a new cluster, all these commands would run on the cache, from the start till the end, in the same order, and this way we will have the cache populated with the backed-up data.

In reality, every operation is not written to the log file immediately. Instead the commands are persisted, and flushed to the log file in the disk at regular intervals using fsync().

2. RDB or Relational DataBase Backup File

A totally different way of backing-up data is to dump and take a snapshot of the current state of the cache memory at a specific point. Here, we are backing-up the data, instead of the commands that resulted in the data. This dump is taken in a file, called the RDB file. We can restore all data from this file into the cluster to get the exact data back into the cache, in case it went down.

However, how does RDB work? As we had specified earlier, the Redis Cache runs on a single thread. If we used this one thread to take a snapshot of the memory, how will incoming read and write requests be served? Do we just reject those calls? Of course not.

When taking an rdb snapshot of the cache using the BGSAVE operation, Redis forks its process. While the child process persists the data to the disk, the original process continues to serve reads and writes.

We know that during a fork, the child process makes an entire copy of the parent process’ memory. Thus, new writes during the fork process, necessitate the presence of more memory for the newer data. This means that there must be sufficient memory on the instance, such that there is enough space for twice the memory occupied by the data (enough memory for the parent process where the main data is present + enough memory for the child process to hold exactly the same amount of data) and some more memory for the new writes to succeed.

Firstly, forking may take a long time. Moreover, once a child process has been spawned, if the snapshotting process takes too long, too many writes could have filled the additional memory. If this additional memory space for the incoming writes that hit the cache while the back-up is running is not large enough, this would slow down processing. Thus, the user has to ensure maintaining a balance between the actual write memory and the buffer. [Reference]

It’s thus, always better to take a back-up from a reader replica.

As can be guessed, it is faster to load from RDB than from AOF, since RDB has all the data ready, whereas AOF will have to replay all the commands from the start. And as is evident, AOF is an ongoing process, while RDB snapshotting is done at a specific point of time.

What does ElastiCache provide with respect to Backup?

Elasticache has an interesting concept of multi-AZ deployment out-of-the-box. Wherein the primary and secondary replica nodes are placed in different AZs and the primary instance is asynchronously replicated to the secondary ones over inexpensive low-latency network. This ensures higher reliability of not losing data.
The AOF method for backup is available only if multi-AZ Redis configuration is not chosen. AOF is not reliable in case the node itself goes down since it is persisted to the disk of the same node; it is useful only if the node restarts. Also, AOF files are huge since it maintains a log of every transaction that takes place. Multi-AZ is more useful than AOF in both cases. If the primary node fails, it fails over to a replica with the least replication lag, which has a copy of the entire data and which is then promoted to primary. This is no doubt faster than re-building a primary from an AOF, since the reader is already in sync with the primary.
ElastiCache provides RDB backup through manual or auto-backups a.k.a. snapshots. These snapshots or rdb files are stored in an S3 bucket, and we can spin up a new cluster or the same cluster using this snapshot.
There are 2 variants of RDB Backup method — a forked method and a forkless one. One of them is selected based upon the available memory — if sufficiently available, the forked method is used. Else the forkless method.
We have already talked about the forked method above. The total memory available on a node for cache to use, for all its operations (holding data and all other operational functionalities), is represented by the maxmemory directive (we’ll talk about directives in the next section). This corresponds to the memory size of the instance in AWS ElastiCache. Now remember we had talked about the RDB snapshotting process earlier? If we allot all memory available on the node to hold the data, there will not be enough memory for the child process to create a copy of the data into, and thus the backup would fail. In order to avoid this, Elasticache provides 2 directives — reserved-memory and reserved-memory-percent. This can be used to set what portion or fraction of the entire memory available (maxmemory) on the node is to be reserved for other operational purposes, such as buffering writes during backup. We can set these values as per our requirement. For the forked method, we’ll need to reserve 50% to serve the worst case scenario. [Reference1, Reference2]
Till Redis 2.8.22, it provided only the forked approach to take a backup using BGSAVE. From 2.8.22, Redis uses a forkless approach if sufficient memory isn’t available.
The forkless process uses the “COPY-ON-WRITE” strategy which is a popular memory optimization technique. Briefly summarizing, in this approach, a copy is made of only those pages in memory, to which writes take place. These copies are kept separately, so that the newly incoming data doesn’t interfere with the backup process. Once the backup process completes, these pages replace the old ones. In the worst case, if writes happen to all pages in the memory during backup, we’d need enough space for a copy of each page. Which means we’ll need to reserve 50% of the maxmemory (reserved-memory-percent=50).

Part D: Directives

Directives are a set of properties that allow us to configure different parameters and aspects of Redis. We have already spoken about a few of them - maxmemory, maxmemory-policy, lfu-log-factor, etc. and what role they play. There are hundreds of more, and they have all been specified here. A few more examples of directives include - maxclients that configures the maximum number of clients that can be connected to a Redis instance at a specific point of time; appendfilename that configures the name of the file the AOF back-up method writes the commands to. Redis has default values for these directives.

These directives are set in the redis.conf file or through the ‘CONFIG SET’ command.

For e.g., to set the maxmemory directive we discussed under Eviction, you say ‘CONFIG SET maxmemory 300mb’, or just write ‘maxmemory 300mb’ in the redis.conf file.

Note however, that the ‘CONFIG SET’ command alters the configuration only for the current session. To make the changes permanent, and to write them to the redis.conf file, use the ‘CONFIG REWRITE’ command, after ‘CONFIG SET’. This will make those changes in the redis.conf file as well, so that the changes take effect in case the node restarts.

How does ElastiCache support using these directives?

Through a special AWS concept called ‘Parameter Groups’. They allow setting these Redis specific configurations into the Redis engines running on the Elasticache instances. If we do not configure this ourselves, Elasticache uses a default parameter group, that already has a set of default values configured for those fields. Such as maxmemory, which can’t be altered in AWS and is equal to the memory size of the ElastiCache instance.

Or remember our discussion under Eviction? How is the default volatile-lru eviction policy set? Through this default parameter group.

And thus to override these configurations, we can use our own parameter group. This is similar to altering the Redis.conf file. Just like the Redis.conf file can be changed at runtime, we can make changes to the parameter group while the ElastiCache is up, and it will take effect.

Part E: Redis Data structures

You’ll see that in many places, Redis is described not just as a “data store”, but a “data structure store”. And why is it so? That’s due to the way Redis is designed to support various data structures.

We know that Redis is a No-Sql Database. Like any other No-sql DB, there will be a key, corresponding to which we have a value. As an advantage of a No-sql database, all records needn’t have the same set of fields or data structure. Which means that one record can have the value as a list, another can have a hash (which is a key-value) itself as a value, or a bitmap!

We can even perform operations on them in-memory. For instance, set intersection, etc.

The working of each dataset operation and its time complexity has been very well explained in the Redis official documentation.

This ends our DPBDD (if you remember the coolest acronym I came up with). Hope that was a good summary of what Redis is a store of (it has a lot more to offer actually, as mentioned below). I will be writing a few more topics which you might find interesting.

Client-side libraries (Java)

How to interact with Redis?

We can use Redis-cli to interact with a Redis server.

Or if we choose to do so from our application, there are hundreds of libraries available to interact with the Redis engine, specific to each programming language, that allows us to do all the operations we discussed in this blog. For instance, with respect to Java, there are three popular Redis Client libraries — Jedis, Lettuce and Redisson.

With these libraries, we are allowed to do all those operations on Redis that we discussed about above — storing, accessing, updating and deleting key-values where values can take different data structures, setting a TTL for them, or doing without a TTL.

Moreover these Redis libraries handle several complex functionalities of handling writing and accessing from distributed Redis cache or standalone Redis instance, as the case may be, behind the scenes, without the developer having to go into the intricacies of the working. These libraries also handle the serializing and nesting of complex data types, all out-of-the-box.

Redis can be more than just a data-store

Till now we only discussed Redis as a data-store, where when requests are made, the data is returned.

However, Redis can also be used as a messaging broker, a pub-sub model. Did you know that? Just like Kafka is a broker or a server, where publishers can publish messages to, and then interested subscribers can consume those messages from, Redis can be used for the exact same purpose. With the same concepts of shards and replicas that we spoke of above, for fault tolerance.

As a fun-fact, Redis is open-source and written in ANSI C.

AWS tools that help set-up Redis Infra As Code

Infra-as-code! AWS has a tool that allows us to define the Redis infra we wish to deploy as a code, and using this, we can have our deployment set up in a go. This is through AWS Cloudformation. I might discuss this in another blog. In a gist, you define the deployment configuration you finally want in a template, and then you actually deploy this template using a stack. The template is written in a yaml format.

AWS provides yet another feature. If you wish to not use yaml, but a programming language you’re familiar with, AWS provides the AWS CDK where we can code and configure the AWS infra we wish to set up in our favourite language. At the time of writing this blog, AWS CDK supports Typescript, Python, Java, JavaScript, Go and C#. Again, more on this probably in another blog.

More on Elasticache

· Data tiering: A feature Elasticache provides where we can store data partially between the memory and the disk. Less frequently accessed data can be stored on the disk — more specifically the SSD. This way we get more storage, and at a lesser cost. However, this is to be used only for applications where some additional latency in accessing the data can be tolerated. LRU items are pushed to the ssd. And by ‘items’ we mean just the values. All keys remain in memory. It’s just the values that are tiered between the memory and ssd.

· Subnet group set-up: While setting up an Elasticache Redis cluster, we naturally need to allocate a set of IPs to the cluster, as we’re going to have a high number of nodes — shards and their replicas. Thus we need to assign a subnet group (a group of IPs) from the VPC which has a sufficient number of IP addresses available.

· Accessing the Elasticache from EC2: Give access to the EC2’s security group in your Elasticache cluster’s security group.

· Global datastore: we had mentioned that shards should be present in a single region, though it can span multiple AZs. What if we want to make it more reliable by having a copy of the cluster in another region, so that we can fail over to it in case of a regional failure. Elasticache has a solution for that too — a global datastore. Wherein, if configured, the cluster in the secondary region, a.k.a. the passive cluster, is kept in sync with the one in the primary region a.k.a. the active cluster through asynchronous replication. Writer instances are present only in the active cluster. At the time of DR, we can promote the passive cluster to primary. However, auto-failover across regions is not possible (at least at the time of writing this blog).

Ending note

While Redis provides the benefits of storing data in memory and thus serving faster data manipulation means, it comes at the cost of well — cost. A RAM is of course more expensive than disk storage, and thus the pricing needs to be well considered.