Elasticache for Redis — Not a Datastore

Miguel Mendez
Yik Yak Engineering
5 min readJan 10, 2017

Redis is a popular in-memory data structure store that can be used as a datastore, cache and message broker. There are lots of posts about companies successfully using Redis as an integral part of their service. For example, this post gives a good overview of how Twitter has used it to great effect.

So, it stands to reason that application developers would want to use some of the same techniques and that cloud providers would attempt to satisfy that desire. Hence, Elasticache for Redis — AWS’ managed Redis instances.

In this post we wanted to share the gotchas that Yik Yak’s early-stage startup code ran into, which were a convolution of misuse of Redis as a datastore and subtleties in the Elasticache for Redis offering.

Redis as a datastore, sounded logical?

An early version of Yik Yak used DynamoDB as its canonical store. Everything was stored in DynamoDB — users, their posts, etc. Not only did this system run into the gotchas outlined in our previous post DynamoDB — What you should know…, but things were made even worse by the fundamental requirement of needing to return a feed of messages to a user which were posted within a given radius of the user’s current location. This requirement was satisfied by a geo algorithm that did not lend itself to queries that could be efficiently satisfied by DynamoDB given the data model that they chose (in a future post we will cover the new efficient, performant geostore built on the Google S2 Library and Google Bigtable).

In our application where reads outnumbered the writes and the DynamoDB queries were not efficient due to the geo model, the implementers opted to leverage Redis not only as a cache but as a datastore itself. This allowed for the world to be divided into small squares and each square was modeled as a list of messages in Redis. This way the system could satisfy the feed requirement by spiraling out from a user’s current location until it had found enough messages to produce the feed. The messages would still be stored in DynamoDB but the list of messages belonging to one of the squares only ever lived in Redis.

It worked and it was reasonably quick, but there were consequences.

Redis as a datastore — Not durable by default

By default, whatever you store in Redis is only stored in the server’s memory. If the server gets restarted or dies there is no default way to recover that data. You need to consider how you will deal with this possibility. In the original Yik Yak Redis configuration the answer was to not have persistence. If the message list data for a given square region was in Redis was lost no attempt to recover the data was made. The required DynamoDB scan was too expensive and in practice the interesting lists would populate quickly due to activity.

It should be pointed out that Redis does have persistence options but you need to read the fine print and decide if those options work for you.

Redis as a datastore — Bounded by the server’s memory

Now what happens when the amount of data that you need to store in that Redis instance gets too big? You could partition the data across multiple Redis instances, but this is logic and code that you will need to develop, debug and maintain. Definitely feasible but it isn’t trivial (especially if you end up having to partition on the fly).

The next option is to update to a server that has more memory. Sounds simple, right? The issue is that this makes you more vulnerable to AWS Pricing issues and sync latencies.

AWS Pricing

When you set up an Elasticache for Redis instance, you get to pick the instance type that AWS will use for the servers and how many read-only slaves you want to have. The devil in the details is that the AWS instance types that have larger amounts of memory also have more vCPUs and are much more costly per month. So you get more memory and you get more vCPUs… Except Redis is single threaded and in the AWS hosted Redis offering there is no way to try to salvage some benefit from the larger boxes by having more server instances running on it. To put a finer point on it, in 2015 we were spending around $10,000 per month on Redis just so we satisfy the memory requirements.

This will make large memory Redis instances very expensive and keep you from getting the most out of your money.

Sync Latencies

The next issue with large Redis instances with multiple slave servers is that even though they asynchronously sync the slaves, as the amount of memory being synchronized increases, the amount of work and time required to keep the slaves in sync also increases. Seems pretty obvious until you consider replication lag.

When the replication lags too far behind, Redis will go into a mode where it forces the master server to do a full save to disk of its memory which is then shipped to all of the slaves so the slaves can use that as their new state. We actually suffered outages in production which were due (at least in part) to this behavior.

The graph below is not from the outage. It was captured as part of turning down Redis, but demonstrates the behavior I’m talking about. The blue line is the master server saving its image to disk. The other lines are slaves picking up the image. We had so much data in memory that it took about 30 minutes for the master to save its state and about 30 minutes for each slave to pick it up. When this happens in a production setting you will experience severe service degradation if not a hard down condition.

Wrap Up

We have highlighted some of the gotchas that you may encounter if you go down the path of using AWS-Hosted Redis as a datastore: not durable by default, you assume responsibility for partitioning the data to manage memory pressure, increased cost due to the need for bigger servers, and likely sync latency issues. For a large majority of situations, you are better off using Redis as a plain old cache. If after contemplation you still want to use Redis as a datastore, you want to think about whether there is a way to redefine the problem such that you do not need to. In our case, we were able to use Google’s S2 library and Google Bigtable to completely eliminate the need for Redis — this will be the subject of a future post in this series.

--

--