In this article, I will explain Redis in system design as I experienced a lot of issues in a legacy system that depends on Redis as a major part of the design and after reading and understanding Redis I understood the issues
What is Redis?
Redis is an open-source, in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster
So Redis can be used as a traditional monolithic and can be used as distributed system as a cluster of nodes with sharding.
Before talking about Redis I would explain some concepts and why we need these terms and techniques in our systems.
What is memory caching and what is the value-added by caching?
A Cache is like short-term memory. It is typically faster than the original data source. Accessing data from memory is faster than from a hard disk. Caching means saving frequently accessed data in-memory(short-term memory) so the value that is added by caching is retrieving data fastly and reduce calling the original data source it might be SQL DB because the complexity time of reading data will be o(1) as a direct access operation by key in memory like hashtables.
So Redis is a system that offers to us a caching system in both environments monolithic and distributed.
How Redis works?
All Redis data resides in the server’s main memory, in contrast to databases such as PostgreSQL, SQL Server, and others that store most data on disk. In comparison to traditional disk-based databases where most operations require a roundtrip to disk, in-memory data stores such as Redis don’t suffer the same penalty. They can therefore support an order of magnitude more operations and faster response times. The result is — blazing fast performance with average read or writes operations taking less than a millisecond and support for millions of operations per second.
So as Redis is faster than traditional DB can we consider it the first source of truth?
The answer is No we can not consider Redis as the first source of truth it always comes as second support to improve the performance of the system because from the CAP theorem perspective Redis is neither highly available nor consistent. to understand why let's explain how Redis sync the data from memory to disk as the disk can consider consistency.
As we explained before Redis data lives in memory, which makes it is very fast to write to and read from, but in case of server crashes you lose all that’s in the memory, for some applications, it’s ok to lose these data in case of a crash, but for other apps, it’s important to be able to reload Redis data after server restarts.
So Redis provides a different range of persistence options:
- RDB (Redis Database): The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
- AOF (Append Only File): The AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log in the background when it gets too big.
- No persistence: If you wish, you can disable persistence completely, if you want your data to just exist as long as the server is running.
- RDB + AOF: It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.
So the most important part there is understanding the trade-offs between the RDB and AOF persistence as the No persistence is very clear that no level of consistency there even strong or bad consistency.
RDB advantages :
- RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance, you may want to archive your RDB files every hour for the latest 24 hours and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
- RDB is very good for disaster recovery, being a single compact file that can be transferred to far data centers.
- RDB allows faster restarts with big datasets compared to AOF
- RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, but you can have multiple save points). However you’ll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason, you should be prepared to lose the latest minutes of data.
- Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second write performances are still great (fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress.) but you can only lose one second worth of writes.
- The AOF log is an append-only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with a half-written command for some reason (disk full or other reasons) the Redis-check-of tool is able to fix it easily.
- Redis is able to automatically rewrite the AOF in the background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
- AOF contains a log of all the operations one after the other in an easy-to-understand and parsed format. You can even easily export an AOF file. For instance, even if you’ve accidentally flushed everything using the FLUSHALL command, as long as no rewrite of the log was performed in the meantime, you can still save your data set just by stopping the server, removing the latest command, and restarting Redis again.
- AOF files are usually bigger than the equivalent RDB files for the same dataset.
- AOF can be slower than RDB depending on the exact fsync policy.
- Finally, AOF can improve the data consistency but does not guarantee so likely you can lose your data but less than RDB mode considering the RDB is faster.
what should I use?
It depends as usual in any system design but The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you. If you care a lot about your data, but still can live with a few minutes of data loss in case of disasters, you can simply use RDB alone.
After we explain the mechanism of data store in Redis let’s explain two important persistence models.
By default Redis saves snapshots of the dataset on disk, in a binary file called
dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.
How it works:
- Redis forks. We now have a child and a parent process.
- The child starts to write the dataset to a temporary RDB file.
- When the child is done writing the new RDB file, it replaces the old one.
So Redis stores snapshots of your data to disk in a dump.rdb file in the following conditions:
- Every minute if 1000 keys were changed
- Every 5 minutes if 10 keys were changed
- Every 15 minutes if 1 key was changed
So if you’re doing heavy work and changing lots of keys, then a snapshot per minute will be generated for you, in case your changes are not that much then a snapshot every 5 minutes, if it’s really not that much then every 15 minutes a snapshot will be taken.
Snapshotting is not very durable. If your computer running Redis stops, your power line fails, or you accidentally
kill -9 your instance, the latest data written on Redis will get lost. While this may not be a big deal for some applications, there are use cases for full durability, and in these cases, Redis was not a viable option. The append-only file is an alternative, fully-durable strategy for Redis. It became available in version 1.1. You can turn on the AOF in your configuration file:
How durable is the append-only file?
As we explained in AOF section we have the following options for durability levels
fsyncevery time new commands are appended to the AOF. Very very slow, very safe. Note that the commands are appended to the AOF after a batch of commands from multiple clients or a pipeline are executed, so it means a single write and a single fsync (before sending the replies).
fsyncevery second. Fast enough (in 2.4 likely to be as fast as snapshotting), and you can lose 1 second of data if there is a disaster.
appendfsync no: Never
fsync, just put your data in the hands of the Operating System. The faster and less safe method. Normally Linux will flush data every 30 seconds with this configuration, but it's up to the kernel exact tuning.
How it works:
- Redis forks, so now we have a child and a parent process.
- The child starts writing the new AOF in a temporary file.
- The parent accumulates all the new changes in an in-memory buffer (but at the same time it writes the new changes in the old append-only file, so if the rewriting fails, we are safe).
- When the child is done rewriting the file, the parent gets a signal and appends the in-memory buffer at the end of the file generated by the child.
- Profit! Now Redis atomically renames the old file into the new one and starts appending new data into the new file.
So from this, we can understand that Redis can not guarantee consistency under any model as the writing to disk is always done async by the engine, and likely you can lose the data if the crash happened before data sync you can reduce this but you can not prevent.
What about availability?
It is clear enough that Redis in monolithic can not grantee any level of availability as the single instance means a single point of failure so let’s explain others models of Redis.
Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes.
Redis Cluster also provides some degree of availability during partitions, that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. However, the cluster stops operating in the event of larger failures (for example when the majority of masters are unavailable).
So in practical terms, what do you get with Redis Cluster?
- The ability to automatically split your dataset among multiple nodes.
- The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
Distributed Storage of Redis Cluster?
Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call a hash slot. There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible for a subset of the hash slots, so for example you may have a cluster with 3 nodes, where:
- Node A contains hash slots from 0 to 5500.
- Node B contains hash slots from 5501 to 11000.
- Node C contains hash slots from 11001 to 16383.
This allows to add and remove nodes (scale) in the cluster easily and does not require any downtime.
Redis Cluster master-slave model (Redis Fail-over)?
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).In our example cluster with nodes A, B, C, if node B fails the cluster is not able to continue, since we no longer have a way to serve hash slots in the range 5501–11000. However, when the cluster is created (or at a later time) we add a slave node to every master so that the final cluster is composed of A, B, C that is master nodes, and A1, B1, C1 that are slave nodes. This way, the system is able to continue if node B fails.
Redis Cluster consistency guarantees?
Consistency is always very important but as we explained Redis can not guarantee consistency also Redis Cluster is not able to guarantee strong consistency. In practical terms, this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
The first reason why Redis Cluster can lose writes is that it uses asynchronous replication. This means that during writes the following happens:
- Your client writes to master B.
- Master B replies OK to your client.
- Master B propagates the write to its slaves B1, B2, and B3.
As you can see, B does not wait for an acknowledgment from B1, B2, B3 before replying to the client, since this would be a prohibitive latency penalty for Redis, so if your client writes something, B acknowledges the write, but crashes before being able to send the write to its slaves, one of the slaves (that did not receive the write) can be promoted to master, losing the write forever.
How can we improve the level of consistency in Redis?
Redis Enterprise Software (RS) comes with the ability to replicate data to another slave for high availability and persist in-memory data on disk permanently for durability. With the WAIT command, you can control the consistency and durability guarantees for the replicated and persisted database in RS.
With the WAIT command, applications can ask to wait for acknowledgments only after replication or persistence is confirmed on the slave. The flow of a write operation with the WAIT command is shown below:
- Application issues a write,
- The proxy communicates with the correct master “shard” in the system that contains the given key,
- Replication communicated the update to the slave shard.
- Slave persists the update to disk (assuming AOF every write setting is selected).
- The acknowledgment is sent back from the slave all the way to the proxy with steps 5 to 8.
But Note that WAIT does not make Redis a strongly consistent store: while synchronous replication is part of a replicated state machine, it is not the only thing needed. However in the context of Sentinel or Redis Cluster failover, WAIT improves the real world data safety. Specifically if a given write is transferred to one or more replicas, it is more likely (but not guaranteed) that if the master fails, we’ll be able to promote, during a failover, a replica that received the write: both Sentinel and Redis Cluster will do a best-effort attempt to promote the best replica among the set of available replicas. However, this is just a best-effort attempt so it is possible to still lose a write synchronously replicated to multiple replicas.
In terms of talking about scalability as well, I would explain the partitioning term
Partitioning is the process of splitting your data into multiple Redis instances so that every instance will only contain a subset of your keys. The first part of this document will introduce you to the concept of partitioning, the second part will show you the alternatives for Redis partitioning.
What are the benefits offered by partitioning?
Partitioning in Redis serves two main goals:
- It allows for much larger databases, using the sum of the memory of many computers. Without partitioning you are limited to the amount of memory a single computer can support.
- It allows scaling the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
Different implementations of partitioning:
Partitioning can be the responsibility of different parts of a software stack.
- Client-side partitioning means that the clients directly select the right node where to write or read a given key. Many Redis clients implement client-side partitioning.
- Proxy-assisted partitioning means that our clients send requests to a proxy that is able to speak the Redis protocol, instead of sending requests directly to the right Redis instance. The proxy will make sure to forward our request to the right Redis instance according to the configured partitioning schema, and will send the replies back to the client. The Redis and Memcached proxy Twemproxy implements proxy-assisted partitioning.
- Query routing means that you can send your query to a random instance, and the instance will make sure to forward your query to the right node. Redis Cluster implements a hybrid form of query routing, with the help of the client (the request is not directly forwarded from a Redis instance to another, but the client gets redirected to the right node).
Disadvantages of partitioning:
- Operations involving multiple keys are usually not supported.
- The partitioning granularity is the key, so it is not possible to share a dataset with a single huge key like a very big sorted set.
- When partitioning is used, data handling is more complex, for instance, you have to handle multiple RDB / AOF files, and to make a backup of your data you need to aggregate the persistence files from multiple instances and hosts.
- Adding and removing capacity can be complex. For instance, Redis Cluster supports mostly transparent rebalancing of data with the ability to add and remove nodes at runtime, but other systems like client-side partitioning and proxies don’t support this feature. However, a technique called Pre-sharding helps in this regard.
Cluster set slot:
As we explained before Redis cluster distributes the data based on slots.
CLUSTER SETS LOT is responsible for changing the state of a hash slot in the receiving node in different ways. It can, depending on the subcommand used:
MIGRATINGsubcommand: Set a hash slot in migrating state.
IMPORTINGsubcommand: Set a hash slot in importing state.
STABLEsubcommand: Clear any importing / migrating state from the hash slot.
NODEsubcommand: Bind the hash slot to a different node.
This subcommand sets a slot to migrating state. In order to set a slot in this state, the node receiving the command must be the hash slot owner, otherwise, an error is returned. When a slot is set in migrating state, the node changes behavior in the following way:
- If a command is received about an existing key, the command is processed as usual.
- If a command is received about a key that does not exist,
ASKredirection is emitted by the node, asking the client to retry only that specific query into
destination-node. In this case, the client should not update its hash slot to node mapping.
- If the command contains multiple keys, in case none exist, the behavior is the same as point 2, if all exist, it is the same as point 1, however, if only a partial number of keys exist, the command emits an
TRYAGAINerror in order for the keys interested to finish being migrated to the target node, so that the multi keys command can be executed.
This subcommand is the reverse of
MIGRATING, and prepares the destination node to import keys from the specified source node. The command only works if the node is not already the owner of the specified hash slot.
When a slot is set in importing state, the node changes behavior in the following way:
- Commands about this hash slot are refused and a
MOVEDredirection is generated as usual, but in this case, the command follows an
ASKINGcommand, in this case, the command is executed.
In this way when a node in migrating state generates
ASK redirection, the client contacts the target node sends, and immediately after sends the command. This way commands about non-existing keys in the old node or keys already migrated to the target node are executed in the target node, so that:
- New keys are always created in the target node. During a hash slot migration, we’ll have to move only old keys, not new ones.
- Commands about keys already migrated are correctly processed in the context of the node which is the target of the migration, the new hash slot owner, in order to guarantee consistency.
ASKINGthe behavior is the same as usual. This guarantees that clients with a broken hash slots mapping will not write for error in the target node, creating a new version of a key that has yet to be migrated.
This subcommand just clears migrating/importing state from the slot
NODE the subcommand is the one with the most complex semantics. It associates the hash slot with the specified node, however, the command works only in specific situations and has different side effects depending on the slot state. The following is the set of pre-conditions and side effects of the command:
- If the current hash slot owner is the node receiving the command, but for the effect of the command the slot would be assigned to a different node, the command will return an error if there are still keys for that hash slot in the node receiving the command.
- If the slot is in migrating state, the state gets cleared when the slot is assigned to another node.
- If the slot was in importing state in the node receiving the command, and the command assigns the slot to this node (which happens in the target node at the end of the resharding of a hash slot from one node to another), the command has the following side effects: A) the importing state is cleared. B) If the node config epoch is not already the greatest of the cluster, it generates a new one and assigns the new config epoch to itself. This way its new hash slot ownership will win over any past configuration created by previous failovers or slot migrations.
Redis cluster and fault tolerance :
The ability of a system to continue functioning in the face of failures is known as fault tolerance. The failure can be one of the following scenarios
- Node failure
- Network failures
- Application-specific failures
I explained also more details about these behaviors in this article
Distributed system models
In distributed system world nothing 100 % reliable so we always consider faults so when designing a distributed system…
Fault tolerance in Redis depends on the ability of cluster failover. so not all Redis setup can guarantee fault tolerance
1- Redis cluster with Masters of nodes
Consider this setup of Redis Master and Sentinel. The larger box is one node. The smaller box can be considered one container. If the master process dies, the sentinel will detect that the process is dead. It will know that the master is down. However, it won’t be able to bring the service back up. so this setup is not fault-tolerant.
2- The setup consists of two instances one master and one slave. If the master process dies, then the sentinels can detect it and promote the slave. This setup can tolerate master process failure. However, if the node running master dies, then both M1 and S1 will die. In this case, the remaining sentinel will not be able to perform a failover, since it requires a majority of the total sentinels to agree on the new master. This setup will not tolerate master node failure.
3- The setup consists of three instances one master and two slaves instances. Each of the instances is run on different nodes. If the master process dies, or if the master node dies, the majority vote can be done, and one of the slaves can be promoted to be the new master. This setup is tolerant to master process failure and master node failure but this setup can not guarantee the fault tolerance in case of one master and one slave node down.
Finally, we have the following results:
1- Redis is not constant under any conditions and configuration.
2- Redis is not a traditional database and we can not consider it as the first source of truth if consistency is important
3- Redis can offer good consistency under some configurations but will less performance/availability also consistency will not be the same offered by SQL DB so if you want transactional consistency Redis will not be the correct
4- Redis offers automatic failover through clustering mode with failure detection
5- Not all Redis models can guarantee the fault tolerance
6- Redis clustering with sharding needs only if you want to grantee high availability considering the data loss and complexity and you do not need this if your data is not bigger than the memory of one node.
7- Always Read the disadvantages of technologies and techniques before using them.
8- System design comes with trade-offs so you should think deeply because removing some components from your system might be very costly
Note: The Redis Documentation is also available in raw (computer friendly) format in the redis-doc github repository…
Redis cluster tutorial - Redis
This document is a gentle introduction to Redis Cluster, that does not use difficult to understand concepts of…