Redis — an open-source in-memory data structure store, is a very significant piece in modern architecture. There are fewer cases where it is used as a more general data store, but in cache technology landscape it is a clear winner
Lyft recently shared updated numbers (youtube video) for their Redis workloads in 2020:
- 171 clusters
- ~3,600 instances
- 65M million QPS at peak
It is very impressive how much the Redis ecosystem evolved with Core tech remained simple. Big credit to Salvatore Sanfilippo — the creator of Redis.
Minimal core leaves large room for variation of setup scenarios and within the scope of this article, we will review different topologies, each with its own High Availability and data safety characteristics.
The article is introductory. If you are very experienced with Redis you won’t find much of a new material. But if you are new, then it could help you navigate through the variety of setup scenarios, so that you can then dig deeper. This will also set the stage for future writings on that topic
This is a very simple setup. There is a single master and 1 or more replicas. Replicas are trying to maintain an exact copy on the data, stored in the master node
Redis replication is asynchronous and non-blocking. Which means master will keep processing incoming commands so that the potential impact on performance is very small
- increased data safety, so that if a master goes down you can still have your data on other nodes
- Read scalability. You can distribute read workload to your replicas. Although keep in mind that there are no consistency guarantees in this setup. (can be perfectly ok for some workloads)
- *Higher-performance / lower latency on master. With this setup, it is possible to turn off disk persistence on a master. But do it with a big care and read the official doc. (at least disable auto-restarts)
While this setup is easy and straightforward, it comes with deficiencies:
- Doesn’t provide high-availability!!!
- Clients should be topology-aware to gain read scalability. They need to understand where to direct reads and writes
If you don’t have smart clients, which know where to route the traffic, there is a very old setup exists with HAProxy:
It uses a health-check mechanism to ping Redis nodes so that it can identify master and direct traffic to it
Redis Sentinel builds on the replication base and provides high availability for Redis without human intervention.
Sentinel constantly monitors master and replicas and checks if they working as expected. If something goes wrong, it can run automatic failover and promote different master.
In addition to that Sentinel serves as a configuration provider (service-discovery) to connecting clients. In the process of failover, clients will get an updated address of a master node.
It is recommended for a setup to include multiple sentinel nodes and therefore has fault-tolerance characteristics if some of them go down.
Sentinel nodes cooperate in order to agree on the state of the cluster (e.g. what to do in case of failure detection).
Example of sentinel configuration (sentinel.conf):
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1
There it states that a sentinel node monitors group called
mymaster with quorum
sentinel monitor <master-group-name> <ip> <port> <quorum>
There is no need to specify replicas, they will be identified through auto-discovery.
- High-availability. Builds on top of Redis replication, this setup allows for auto-failover
- Service-discovery. Sentinels serve as a configuration provider for clients making them topology-aware
- Usually, this setup involves many nodes with different roles. (e.g. 5 nodes: master, replica, 3 sentinels). So it brings significant complexity, while all writes are still directed to a single master
- It is client drivers which takes the load and implement routing mechanism for the Sentinel setup. Not all drivers support that
This is where things start getting more interesting. Redis Cluster was introduced in 2015, since version 3.0. It is the first out-the-box Redis architecture, which allows writes scalability
- Horizontally scalable. Adding more nodes increases capacity
- Data is sharded automatically
To distribute data Redis Cluster has the notion of hash slots. There is a total number of 16384 of hash slots, and every node in the cluster is responsible for a subset of them
So when a request is sent for a given key, a client computes a hash slot for that key and sends it to an appropriate server. It means Redis Cluster client implementation must be cluster-aware:
- updating hash slots periodically
- handle redirects themselves (in case of cluster topology changes and cluster slots be temporarily out of sync)
- simpler operations (fewer moving parts)
- high-availability out of the box
- horizontal scalability
- clients need to be more complex and sync cluster state themselves
- caveats about the handling of multi-key operations (since data can be located on different nodes)
There is a good talk (youtube video) about moving to Redis Cluster by Ryan Luecke from Box.
Redis [standalone] with Envoy
(or Twemproxy in older days)
(Envoy Proxy doc)
This is the kind of setup we are running at Whisk (https://whisk.com). We very much like it for it’s operational simplicity and good fault tolerance characteristics.
Note that this setup is not suitable if you are planning to use Redis as a persistent data store, but a perfect fit for cache or even can be good enough for features like rate-limiting
I would recommend watching the video (youtube link) from Lyft as it explains the concept very well.
The idea and setup are very simple:
- you have a bunch of standalone Redis nodes, which don’t know anything about each other
- you have set of Envoy proxies in front of them, which know how to distribute traffic (association of a key and a Redis node)
This gives you great operational simplicity. Your client can route traffic to any of Envoy nodes and it will be handled. It is easy to tolerate the failure of any of Redis or Envoy nodes in this setup. In the “Redis as a cache” scenario the worst thing is that you would need to re-calculate a value for a key.
With our Envoy proxies running in Kubernetes close to app services, we have a very reliable and fast setup. And barely experienced any issues
Envoy does hash-based partitioning, where during initialization or topology changes Envoy computes and associates an integer value for each of Redis nodes. Then when a request comes, it hashes it’s key to an integer and finds a node with the closest match to it
- operational simplicity. very easy to set up and maintain. fewer moving parts
- fault-tolerance. you can lose/replace any node with minimal impact
- horizontal scalability. almost no overheads when scaling out and adding more servers
- simple clients. only load balancing or good reliable connection to Envoy is necessary
- exposed metrics out of the box
- only fits certain use cases (e.g. cache) where potentially losing a small portion of a data is acceptable
We outlined some of the basic Redis topologies, each with different characteristics. Each has its benefits and shortcomings. It depends on your scenario which to go with.
But if your scenario is about “best-effort cache” then I’d highly recommend considering the setup with Envoy. Otherwise, if you need safety guarantees for your data — Redis Cluster could be a good choice, which matured enough and has success case studies.
In some of the future writings we might look deeper into a setup with Envoy we are running at Whisk and might consider some performance comparisons of different JVM drivers.
- Redis Replication guide — https://redis.io/topics/replication
- Redis Sentinel guide — https://redis.io/topics/sentinel
- Redis Cluster by Ryan Luecke — https://www.youtube.com/watch?v=NymIgA7Wa78
- Redis Cluster tuturial — https://redis.io/topics/cluster-tutorial
- Redis + Envoy by Lyft — https://www.youtube.com/watch?v=U4WspAKekqM
- Envoy Proxy Redis doc — https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/other_protocols/redis