Redis Concepts and their correlation with AWS ElastiCache - Part 2

Sonam Shenoy
9 min readAug 21, 2023

--

This blog is in continuation to the Part 1 of the Blog Series on Redis Concepts and their correlation with AWS ElastiCache.

Part B: Data Persistence

We have seen to a great extent how Redis can be deployed in various ways for high availability in Part 1, and the options ElastiCache provides.

Now, let’s totally change gears to a concept internal to Redis, forgetting ElastiCache meanwhile. And that concept is — deletion or expunction. How can data be deleted from the cache memory? Of course, like other manual operations on the cache, we can delete keys that we wish to… well… delete. However, there can be other ways that data might get deleted from the Redis cache, without us manually doing so. Of course, the instance crashing, is one obvious way, but that’s not the topic under discussion. Redis has an internal mechanism that does this.

This can again be discussed under 2 sections:

a. Expiry & TTL

b. Eviction policies

Expiry and TTL

Let’s introduce 2 reasons for needing a TTL before we dive more into what TTL is.

When it comes to any cache, due to the limited size of the memory, it is necessary that we keep only the frequently accessed keys in memory and regularly remove unused ones from there, to effectively make use of the cache. How do caches do this? By attaching a TTL or Time To Live to each key. This value determines how long that key-value will remain in memory. Once the TTL duration of a key has passed by, i.e., once the time specified under TTL has ended since the key was added to the memory, the key is said to have ‘expired’ and the entire record is removed or expunged from memory.

Secondly, data might have been inserted into the cache from the database a long time back. This data might not be valid after some time due to changes in the DB done in the meantime. In this case, to ensure integrity of data, we need to make sure that the cache gets updated with the new data regularly. An approach for this could be to update the cache while updating the DB. But the more natural procedure is to set a TTL. On setting an appropriate TTL, these keys expire at regular intervals. So the next time these keys expire, there is a cache-miss and the newer data is fetched from the DB and inserted into the cache. This way, the future calls get this updated value from the cache.

Now, let’s see this concept in the light of Redis.

We can associate TTL with keys while setting or updating it. Redis also gives us the freedom of not setting TTL to keys at all. In this case we persist the key-values in the cache forever at the risk of persisting stale data (unless of course we have a mechanism to update the cache when the DB is updated too). Not having a TTL, means the Redis ‘ttl’ command for that key will return -1. Else the command returns the remaining time in seconds for the key to live in the memory of the cache.

How are keys removed from memory once the TTL has expired? They do not automatically disappear (unlike fictional narratives, where a pixie disappears into thin air at the stroke of midnight). The expired key-values will remain in memory till they are removed by the system.

There are two ways in which Redis carries this out:

1. Passive expiry a.k.a. lazy expiration — in this approach, the expired key is removed when it is tried to be accessed. When a request is made to a key which is present in the cache, but it is found to have expired, it is purged from there.

2. Active expiry a.k.a. random active expiration — instead of the above approach where Redis waits till the time users make a request in order to get an expired key removed, in this method, an active effort is made by Redis itself to clear the cache of the expired data periodically. At the time of writing this blog, the default behaviour of Redis is that it samples 20 keys at random 10 times per second to check if they have expired and removes the one that did. If the number of keys that had expired out of the 20 that had been sampled, made up for 25% or more of that sample, this same process runs again, i.e., it chooses another 20 keys at random again and performs the same check. This continues till the number of keys removed is lesser than 25% of the data sampled.

Redis performs not one of the above approaches, but both of these run simultaneously in an ongoing effort to keep the cache clean.

However, does this run on both the primary and reader replicas independently? Say the primary writer shard instance N1 has a key keyA which does not have the updated value from the database and has expired. Naturally this key is present and has expired on the reader replica N2 as well (where N2 is a reader replica of the writer N1). Say Redis on N1 randomly sampled a set containing keyA, and thus removed it since it has expired. While on N2, a totally different set was chosen at random for active expiry, leading to keyA remaining on N2 with its stale value. Now this would lead to inconsistencies between the primary and reader replicas, since they do not contain the same data. This would result in further consequences. How? Consider the case where a client queries for keyA and his request is directed to N1. It will result in a cache miss and he’ll thus get the fresh value. While another client who queries for the same keyA at the same time might have his request directed to N2 where that key exists with its old data. This means two users get two different values for the same query, and this would surely not be a good user experience, in the best case. And for features that are critical based on real-time values, this would be a havoc.

To prevent such a situation from occurring, this Redis process of actively expiring keys is run only on the primary writer instance. And remember we had spoken of how the commands run on the writer are asynchronously passed on to the reader replicas in order to keep it in sync? The same is followed when keys expire. The expired keys are deleted only on the master (primary), and then the corresponding ‘DEL’ commands are issued to the readers. This way the very keys that expired on the writer, are removed on the readers too. This thus ensures consistency.

However, note that although the reader replicas do not take part in active expiration, the TTL value of all keys is present on the reader replicas as well. This way, in the future, due to a failover, if the reader is elected as a writer, they can take up the task of expiring keys, without any prior work.

What does Elasticache provide with respect to Expiry and TTL?

Elasticache being just an infra where the Redis engine is hosted, Redis would be functioning in the same way, and we as users have the liberty to choose whether to associate TTL with our keys, and to decide the values of the TTLs. This functioning is irrespective of the ElastiCache infra that we leverage.

Eviction

Now, suppose in the previous section, we chose not to let the keys expire by not setting a TTL. Will keys remain in the cache forever in this case? Not necessarily. If the memory becomes full due to too many keys and new inserts do not have enough memory to be written to, either those writes will fail, or the existing keys will be kicked out or ‘evicted’ from the memory in order to make place for the new data.

How is this memory limit, after which keys are to be evicted, configured? Through the Redis ‘maxmemory’ directive. These directives are set in a redis.conf file (more on directives in a “Section E: Directives”). If set to zero, there would be no memory limits, which means that you can keep adding keys until the operating system runs out of RAM and kills the process. That’s not something you want for sure. Thus, set ‘maxmemory’ to a suitable value.

Which records will be removed from the memory in case of an eviction? The ones that were inserted into memory the earliest? But what if those keys are accessed frequently even now? While the ones added later might hardly be used. This would result in an impact on the latency to remove those frequently used keys from memory, thus having to fetch those very keys from the DB again. That logic might not be right. Or that might be exactly what we want.

Thus, Redis provides users with several interesting ways in which we can decide what we want to evict, and this can be set through another directive — ‘maxmemory-policy’. It can take the following values:

  • allkeys-lfu: removes the least frequently accessed key-values
  • volatile-lru: removes only those least recently used key-values which have a TTL associated
  • volatile-lfu: removes only those least frequently used key-values which have a TTL associated
  • allkeys-random: randomly removes key-values
  • volatile-random: randomly removes key-values that have a TTL associated
  • volatile-ttl: removes those key-values with the shortest TTLs remaining
  • noeviction: no eviction :) which means that, if the memory limit is reached, further writes are rejected

Naturally volatile-policies behave like no-eviction if no TTL is associated with any key.

We can change this configuration at runtime.

The names are self-evident, however I would like to speak more about 2 of them as they have quite an interesting working under the hoods.

1. allkeys-lru

An approximated-lru algorithm is used to detect the least recently used keys. What is ‘approximated’ LRU? Instead of finding the key in the entire data store that was used least recently, a random sample is chosen, and the least recently used key amongst that sample is evicted. Thus the actual least recently used key-value of the cache might not be evicted. Why is this done? Since actual LRU i.e., finding oldest accessed key in the entire data store costs a lot of memory. However the approximate algorithm performs really well.

We can tune the size of the sample used in the approximated-lru algorithm with the ’maxmemory-samples’.

2. allkeys-lfu

In this policy, the least frequently keys are removed first. Now, how does Redis track the number of times a key was accessed?

Using a Morris counter. A Morris counter is a probabilistic counter which does an approximated LFU count. Why such a complicated counter? Why not a simple one that increments count for every access? And why approximated LFU?

Now, you know that we do not use cache for the storage of just 1 or 2 keys. But millions of them. And to store the number of times each of them was accessed in binary is a huge consumption of resource. Thus, the smart Morris counter! Which stores an approximate count. If at the cost of a large amount of space, we have a slight deviation in the count, isn’t that a huge gain!

Two things to note:

  • LFU not only increases the count with frequency, but is much smarter, and decreases the count with time too. This is to ensure that keys that were frequently accessed in the past, but are no longer accessed in the present, do not continue to have the same high count, but are evicted.
  • The directives ‘lfu-log-factor’ and ‘lfu-decay-time’ can be used to change the configuration of this Morris counter.

Again, our favourite section -

What does Elasticache offer with respect to Eviction?

How do we configure the Redis directives like ‘maxmemory-samples’ that we discussed above? This is done by using an AWS concept called — Parameter Groups. This will be discussed under “Section E: Directives”.

However, note that, at the time of writing this blog, the eviction policy that ElastiCache uses by default is volatile-lru. And this can be changed through parameter groups.

Coming up next

Having thoroughly seen Deployment Architectures in Part 1 and Data Persistence in Part 2, let’s move on to the remaining concepts of Redis in Part 3.

--

--