DRedis — Our disk-based Redis implementation

Hillevi Eklöw
YipitData Engineering
5 min readNov 6, 2019
Photo by Panumas Nikhomkhai via pexels.com

A Queueing Solution built with Redis

At YipitData, we build our own queueing solution that is optimized for web scraping on top of Redis (AWS ElastiCache). Our queueing solution needs sorted sets and serial processing, and Redis was chosen because of its many data structures and flexibility. In general, Redis has worked very well for us, but as the company is growing so are the small issues we’ve had.

Challenges with using Redis for queueing

Redis is an in-memory database and thus all data live in-memory at all times. Our web scraping systems often have millions of items across a couple of queues, which ends up eating multiple gigabytes of memory space. With these queue sizes in memory, we become particularly sensitive to:

  • AWS Hardware Problems
    If a node gets rebooted, we can lose all the data in the queue as it’s in-memory. This has happened.
  • Limited Storage Space
    An instance needs to be manually replaced in case a drastic increase in queue-size happens. Sometimes, we cannot predict these requirements and adjust timely, which leads to Redis running out of memory.
  • Scaling Down Efficiently
    Downsizing a cluster is overly complicated, and a manual job. To downsize we must create a new cluster from a snapshot, which requires pausing the system and changing environment variables.

Ironically, almost all of our queueing system’s annoyances are products of Redis being an in-memory database (which is the main reason why Redis became so popular in the first place). It would be wonderful if we could have the Redis behavior through an identical API, but persist the data differently. Perhaps to a disk?

A Disk-Based Redis

There are several open-source projects on the world wide web with different implementations of a disk-based Redis, so the idea is not new. By writing to a disk instead of memory, storage would be virtually unlimited, and cheaper, as long as one can afford the performance hit (reading/writing to disk will inevitably always be slower than to memory). However, as we built our own queueing solution on top of Redis, we have very specific needs. For instance, we use Lua to create atomic Redis extensions and assume data is always consistent. Sorted sets support is a must, too.

When researching alternative solutions to Redis back in late 2018, none of the open-source implementations of a Disk Redis fit all our needs, so our Staff Engineer Hugo took the matter into his own hands.

DRedis Implementation

Migrating our current Redis based queueing solution to another system involves a lot of risks, as we have hundreds of projects internally and externally built on top of the current behavior. Changing the underlying logic could break the projects in unexpected ways, and testing on every single project would be very difficult. Hence, the top priority for the solution that the behavior is exactly the same except for write/read to disk instead of memory (and some expected performance loss). The build was named DRedis after “Disk-Redis”, although some of our engineers prefer the arguably much catchier name “Dr. Dredisk Jr”.

At YipitData, we almost exclusively develop in Python, so for maintainability reasons it was decided to build this in Python. In places where speed is essential, Cython is being considered.

DRedis was originally developed with having its own storage system, but this implementation ended up being ~100x slower than Redis, mainly because of Python itself. As a solution, the storage layer was swapped for the open-source library LevelDB, written in C (we use plyvel as the Python binding), and performance improved by a factor of 10, having 1/10th of the speed of Redis instead of 1/100th.

As our scraping systems rely on serial processing, we use Lua to ensure no more than one Redis command runs at the same point in time, as Lua scripts are executed atomically by the server. In DRedis, the Lupa Lua Python library is used.

Today, the biggest difference between how we deploy DRedis and Redis is that DRedis only runs one instance — Generally, Redis is implemented with replicas so in case an instance goes down, the replica can become the primary (data won’t be lost and it provides better availability). Dredis uses a persistent disk, which means that if the DRedis instance has issues, the disk is intact and another instance can spin up and use the very same volume. DRredis uses the Redis protocol (RESP), based on Redis 3.2.6.

DRedis to power our in-house queueing system

With DRedis, all of the issues we experienced with Redis gets handled in the following ways:

  • AWS Hardware Problems
    By writing to a disk instead of memory, we are no longer at risk of our data getting lost if a node is rebooted; we can simply connect the disk to another instance and continue.
  • Limited Storage Space
    Disk storage is significantly cheaper than the equivalent size of memory, and it is virtually impossible to run out of storage space.
  • Scaling Down Efficiently
    EBS volumes can’t downsize on-the-fly, so this will also be a manual process similar to scaling down ElastiCache. However, EBS volumes are cheaper than ElastiCache, so there are still benefits.

Naturally, these improvements to Redis in our clone DRedis aren’t “free”. The tradeoff becomes significant speed loss, and lower availability, but for our use-case — the queueing system — we consider the tradeoff worth it.

DRedis Application Architecture

How YipitData deploys DRedis. Image generated with Cloudcraft.

The previous diagram shows how we’re deploying DRedis. There’s an AutoScaling Group with the desired capacity of 1, an Elastic Network Interface to have a static IP, an EBS volume to store all the data, an IAM role with permissions to mount the EBS volume and attach the ENI on the EC2 instances created by the AutoScaling Group (there’s also a security group to accept inbound on port 6377 but that’s not in the image). The EC2 user-data script is responsible for attaching the ENI (1.2.3.4 in the previous diagram) and re-configuring the network to route everything through it.

DRedis Cost Savings

Depending on AWS instance type and node count, DRedis could save more than 70%. For example, we can replace a two-node r3.large Redis cluster with one c5.large DRedis instance. The former costs us $244/month (2 x $122, one 12GB primary node and one 12GB replica) but DRedis only costs $66/month ($61 for the EC2 + $5 for 50GB of disk space).

Additionally, DRedis will not lose data in the case of hardware failures and there’s a lot more room for the data to grow (12GB vs. 50GB).

However, the DRedis availability with a single instance is worse than a Redis cluster with 2 or more instances. If there’s a hardware failure, availability will suffer. We’re trading availability for lower server and maintenance costs.

Next Steps

Today we are successfully using DRedis on a few production projects, and improvements are always welcome, specifically on the speed. DRedis is open-source and available on GitHub under the MIT License.

Acknowledgments

Many thanks to Hugo Lopes Tavares for sharing his knowledge and helping me complete this article, and to Mingwei Gu and Andrew Gross for their thoughtful reviews.

--

--