Redis Topologies

Viktor Taranenko
Sep 1, 2020 · 6 min read

Redis — an open-source in-memory data structure store, is a very significant piece in modern architecture. There are fewer cases where it is used as a more general data store, but in cache technology landscape it is a clear winner

Lyft recently shared updated numbers (youtube video) for their Redis workloads in 2020:

  • 171 clusters
  • ~3,600 instances
  • 65M million QPS at peak

It is very impressive how much the Redis ecosystem evolved with Core tech remained simple. Big credit to Salvatore Sanfilippo — the creator of Redis.

Minimal core leaves large room for variation of setup scenarios and within the scope of this article, we will review different topologies, each with its own High Availability and data safety characteristics.

The article is introductory. If you are very experienced with Redis you won’t find much of a new material. But if you are new, then it could help you navigate through the variety of setup scenarios, so that you can then dig deeper. This will also set the stage for future writings on that topic

Replicated setup

Image for post
Image for post
Redis basic replicated setup

Redis replication guide

This is a very simple setup. There is a single master and 1 or more replicas. Replicas are trying to maintain an exact copy on the data, stored in the master node

Redis replication is asynchronous and non-blocking. Which means master will keep processing incoming commands so that the potential impact on performance is very small

Benefits:

  • increased data safety, so that if a master goes down you can still have your data on other nodes
  • Read scalability. You can distribute read workload to your replicas. Although keep in mind that there are no consistency guarantees in this setup. (can be perfectly ok for some workloads)
  • *Higher-performance / lower latency on master. With this setup, it is possible to turn off disk persistence on a master. But do it with a big care and read the official doc. (at least disable auto-restarts)

While this setup is easy and straightforward, it comes with deficiencies:

  • Doesn’t provide high-availability!!!
  • Clients should be topology-aware to gain read scalability. They need to understand where to direct reads and writes

If you don’t have smart clients, which know where to route the traffic, there is a very old setup exists with HAProxy:

It uses a health-check mechanism to ping Redis nodes so that it can identify master and direct traffic to it

Redis Sentinel

Redis Sentinel Guide

Redis Sentinel setup

Redis Sentinel builds on the replication base and provides high availability for Redis without human intervention.

Sentinel constantly monitors master and replicas and checks if they working as expected. If something goes wrong, it can run automatic failover and promote different master.

In addition to that Sentinel serves as a configuration provider (service-discovery) to connecting clients. In the process of failover, clients will get an updated address of a master node.

It is recommended for a setup to include multiple sentinel nodes and therefore has fault-tolerance characteristics if some of them go down.

Sentinel nodes cooperate in order to agree on the state of the cluster (e.g. what to do in case of failure detection).

Example of sentinel configuration (sentinel.conf):

sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1

There it states that a sentinel node monitors group called mymaster with quorum 2

sentinel monitor <master-group-name> <ip> <port> <quorum>

There is no need to specify replicas, they will be identified through auto-discovery.

Benefits

  • High-availability. Builds on top of Redis replication, this setup allows for auto-failover
  • Service-discovery. Sentinels serve as a configuration provider for clients making them topology-aware

Deficiences

  • Usually, this setup involves many nodes with different roles. (e.g. 5 nodes: master, replica, 3 sentinels). So it brings significant complexity, while all writes are still directed to a single master
  • It is client drivers which takes the load and implement routing mechanism for the Sentinel setup. Not all drivers support that

Redis Cluster

Image for post
Image for post

This is where things start getting more interesting. Redis Cluster was introduced in 2015, since version 3.0. It is the first out-the-box Redis architecture, which allows writes scalability

Highlights:

  • Horizontally scalable. Adding more nodes increases capacity
  • Data is sharded automatically
  • Fault-tolerant

To distribute data Redis Cluster has the notion of hash slots. There is a total number of 16384 of hash slots, and every node in the cluster is responsible for a subset of them

So when a request is sent for a given key, a client computes a hash slot for that key and sends it to an appropriate server. It means Redis Cluster client implementation must be cluster-aware:

  • updating hash slots periodically
  • handle redirects themselves (in case of cluster topology changes and cluster slots be temporarily out of sync)

Benefits:

  • simpler operations (fewer moving parts)
  • high-availability out of the box
  • horizontal scalability

Deficiencies:

  • clients need to be more complex and sync cluster state themselves
  • caveats about the handling of multi-key operations (since data can be located on different nodes)

There is a good talk (youtube video) about moving to Redis Cluster by Ryan Luecke from Box.

Redis [standalone] with Envoy

(or Twemproxy in older days)

Image for post
Image for post
Redis with Envoy Proxy

(Envoy Proxy doc)

This is the kind of setup we are running at Whisk (https://whisk.com). We very much like it for it’s operational simplicity and good fault tolerance characteristics.

Note that this setup is not suitable if you are planning to use Redis as a persistent data store, but a perfect fit for cache or even can be good enough for features like rate-limiting

I would recommend watching the video (youtube link) from Lyft as it explains the concept very well.

The idea and setup are very simple:

  • you have a bunch of standalone Redis nodes, which don’t know anything about each other
  • you have set of Envoy proxies in front of them, which know how to distribute traffic (association of a key and a Redis node)

This gives you great operational simplicity. Your client can route traffic to any of Envoy nodes and it will be handled. It is easy to tolerate the failure of any of Redis or Envoy nodes in this setup. In the “Redis as a cache” scenario the worst thing is that you would need to re-calculate a value for a key.

With our Envoy proxies running in Kubernetes close to app services, we have a very reliable and fast setup. And barely experienced any issues

Envoy does hash-based partitioning, where during initialization or topology changes Envoy computes and associates an integer value for each of Redis nodes. Then when a request comes, it hashes it’s key to an integer and finds a node with the closest match to it

Image for post
Image for post
Image from Lyft presentation: https://www.slideshare.net/RedisLabs/redisconf18-2000-instances-and-beyond

Benefits:

  • operational simplicity. very easy to set up and maintain. fewer moving parts
  • fault-tolerance. you can lose/replace any node with minimal impact
  • horizontal scalability. almost no overheads when scaling out and adding more servers
  • simple clients. only load balancing or good reliable connection to Envoy is necessary
  • exposed metrics out of the box

Limitations

  • only fits certain use cases (e.g. cache) where potentially losing a small portion of a data is acceptable

Summary

We outlined some of the basic Redis topologies, each with different characteristics. Each has its benefits and shortcomings. It depends on your scenario which to go with.

But if your scenario is about “best-effort cache” then I’d highly recommend considering the setup with Envoy. Otherwise, if you need safety guarantees for your data — Redis Cluster could be a good choice, which matured enough and has success case studies.

In some of the future writings we might look deeper into a setup with Envoy we are running at Whisk and might consider some performance comparisons of different JVM drivers.

Resources

Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Sign up for Best Stories

By Dev Genius

The best stories sent monthly to your email. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Viktor Taranenko

Written by

CTO at @WhiskTeam (https://whisk.com). Passionate about triathlon, cooking and distributed systems.

Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Viktor Taranenko

Written by

CTO at @WhiskTeam (https://whisk.com). Passionate about triathlon, cooking and distributed systems.

Dev Genius

Coding, Tutorials, News, UX, UI and much more related to development

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store