What is Redis?
Overview
Redis is a C programming written REmote DIctionary Server developed in 2006. Redis read and write operations are very fast because it stores the data in memory. The data can also be stored on the disk and written back to the memory. It is often called a data structure server because keys can contain strings, hashes, lists, sets, sorted sets, bitmaps, and hyperloglogs. Redis is generally not feasible to use for very large data sets due to the hardware requirements. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence. Redis is a popular choice for caching, session management, gaming, leaderboards, real-time analytics, geospatial, ride-hailing, chat/messaging, media streaming, and pub/sub-apps.
There are two important things to consider when choosing whether to use SQL or NoSQL to store data: its nature and usage pattern. Transactional details, historical data, and server logs are examples of data that are a particularly good fit for nonrelation storage. NoSQL databases are generally faster, particularly for write operations, making them a good fit for write-heavy applications.
Redis Structure
The Redis structure is composed of sentinels, masters, and slaves. This structure can be divided into three main mechanisms:
- Standalone: In a standalone architecture, Redis is run as a single instance and can be used to store and retrieve data in memory. This structure consists of a single main instance for processing all the write and read operations of the clients. This is the simplest architecture and is suitable for smaller-scale applications that don’t require high levels of scalability or fault tolerance.
- Master-Slave: In a master-slave architecture, there is a single Redis master instance responsible for handling read and write requests and one or more Redis slave instances that replicate data from the master and are read-only available. This architecture provides increased scalability and fault tolerance, as read traffic can be distributed across multiple slave instances, and failover can be performed automatically if the master fails.
- Cluster: In a Redis cluster architecture, multiple Redis nodes are grouped to form a distributed system. Data is automatically partitioned across the cluster, and each node is responsible for a subset of the data. This architecture is composed of several sentinels, masters, and slaves. The difference of this architecture from the others is that the sentinels are in charge of managing the assignments of the instances (from master→slave and vice versa) and also provide a “Sentinel State” for monitoring the state of all the instances in the cluster which stores information about the Redis servers in the system, including their IP addresses, port numbers, and the state of the servers (master, slave, or sentinel). A cluster can have several sentinel instances but each sentinel has the exactly same information as the other sentinels because they share information with each other. If one of the sentinels fails, another sentinel is available so it is possible to provide high availability. This architecture provides high levels of scalability and fault tolerance, as data can be distributed across many nodes, and failover can be performed automatically if a node fails.
Redis Data Structures
Redis is very well known for its support of a variety of data structures. This is very advantageous because every data structure can be used for different types of applications. For example, sorted sets can be used for a game application to make a ranking table, or the HyperLogLog (which is a data structure that calculates data’s cardinality or “count”) to make a count of let’s say several traffic intersection traffic according to the license plates but want to protect user’s private information (license plates). Next, I will introduce Redis-supported data structures:
- Strings — sequence of characters (text) or binary data (up to 512Mb)
- Lists — (linked lists) node-pointer-based linear collections of data elements in the order that they were added in
- Sets — an unordered collection of un-repeated string elements
- Sorted Sets — Sets associated with a score so the order can be maintained
- Hashes — store a mapping of keys to values, the values that can be stored in hashes are strings and numbers
- Bitmaps — also known as bit arrays and bit vectors are array data structures that store bits
- HyperLogLogs — a heuristic data structure used to count unique items while using very low memory, the downside is a standard error rate of 1%
- Streams — append-only data structure where stream entries are composed of one or multiple field-value pairs
- Geospatial — store named points in their associated latitude and longitude coordinates
- BloomFilter — Run an item through a quick hashing function and sample bits from that hash and set them from 0 to 1 at a particular interval in a bitfield. To check for existence in a Bloom filter, the same bits are sampled.
Redis Persistence
Persistence refers to the process of writing data to durable storage, such as a solid-state disk (SSD). Redis provides a range of persistence options:
- RDB — Performs point-in-time snapshots of your dataset at specific intervals
- AOF — Logs every written operation received by the server. These operations can then be replayed again at server startup, reconstructing the original dataset.
- No Persistence — Disable persistence completely. This is sometimes used when caching
- RDB + AOF — Combine both in the same instance.
Data cannot be persisted directly to another DB in Redis. For example, Hbase and Redis share the same file type (both files are in bytes) but redis is a snapshot of the whole DB for recovery whereas Hbase is a key-value compound file. Therefore, to persist data into another NoSQL DB, there is the need of having an API for processing and saving the data. Redis Persist Data in two ways:
RDB Format:
- A binary File Format optimized for fast storage and retrieval of Redis data. This configuration periodically saves a snapshot of the in-memory database to disk as an RDB file (extension → .rdb). The frequency of these snapshots can be configured using the “save” configuration in the Redis configuration file. When crashing, the data can be lost because it takes snapshots of data at regular intervals. When Redis saves the data to disk using the RDB format, the data in memory is flushed to disk, and the memory is cleared.
AOF Format
- A log file that records all write operations executed against the Redis database. This configuration appends each write operation to the AOF file as it is executed. This allows Redis to recover the entire database from the AOF file in the event of a crash or other failure (extension → .aof). The I/Os can be increased because this configuration appends every time there is a write operation. When the AOF file is too big, Redis creates a new AOF file. The old one is not erased until the minimal needed operations to build up again the database is written in the new AOF file and replaced to continuously write in the new file and so on.
The AOF and RDB files are not typically used for searching a key directly, they are critical components of Redis persistence and data recovery. Redis uses a hash table in memory to perform fast key lookups and access data, but if the data is not in memory, Redis will load it from the AOF or RDB file to restore it to memory.
Some of the benefits of using these two files are:
- Faster restart times — Restart times can be faster compared to having to rebuild the dataset from scratch.
- Data backups — Used to create regular backups of the Redis dataset
- Replication — The slave instances can use the AOF and RDB files to rebuild the dataset from the master instance.
- Memory optimization — Removing keys that are not accessed frequently by applying the LRU policy.
While the primary purpose of the AOF and RDB files is for data persistence and recovery, they provide additional benefits that can improve the performance, reliability, and scalability of a Redis deployment.
Another curious fact is that the Redis .rdb
file and HBase .hfile
file are both binary file formats used for storing data in the respective data store. However, the Redis .rdb
file is a snapshot of the Redis database at a specific point in time. It contains all the data in a Redis instance, serialized in a binary format. On the other hand, HBase .hfile
is a file format used for storing data in the HBase distributed database. It contains a block of key-value pairs, where the keys are sorted in a specific order to enable efficient lookups. Plus HBase distributed its file across HDFS.
Redis Indexing
Redis supports several data structures that are managed by a hash table with a time complexity of O(1). Because hash keys come from hash functions, the values generated by the functions are always going to be the same since these functions are known to be deterministic. In Redis, this is advantageous because data can be written or read very fast thanks to this optimization. In Redis, a data structure is inserted, as shown below, and the user needs to specify the data structure type (set, hash, string, etc…). This data structure is firstly preprocessed from its corresponding data structure file. Then, a key is generated from the dict.c
and eventually added to the Hash Table for its further management.
Redis vs Other Systems
Memcached is a high-performance distributed memory cache service designed for simplicity. Redis offers a rich set of features that make it effective for a wide range of use cases.
- Sub-millisecond latency — Both Redis and Memcached support sub-millisecond response times. By storing data in memory they can read data more quickly than disk-based databases.
- Developer ease of use — Both Redis and Memcached are syntactically easy to use and require a minimal amount of code to integrate into your application.
- Data partitioning — Both Redis and Memcached allow you to distribute your data among multiple nodes. This allows you to scale out to better handle more data when demand grows.
- Support for a broad set of programming languages — Both Redis and Memcached have many open-source clients available for developers. Supported languages include Java, Python, PHP, C, C++, C#, JavaScript, Node.js, Ruby, Go, and, many others.
- Advanced data structures — In addition to strings, Redis supports lists, sets, sorted sets, hashes, bit arrays, and hyperloglogs. Applications can use these more advanced data structures to support a variety of use cases. For example, you can use Redis Sorted Sets to easily implement a game leaderboard that keeps a list of players sorted by their rank.
- Multithreaded architecture — Since Memcached is multithreaded, it can make use of multiple processing cores. This means that you can handle more operations by scaling up compute capacity.
- Snapshots — With Redis you can keep your data on disk with a point-in-time snapshot which can be used for archiving or recovery.
- Replication — Redis lets you create multiple replicas of a Redis primary. This allows you to scale database reads and to have highly available clusters.
- Transactions — Redis supports transactions that let you execute a group of commands as an isolated and atomic operation.
- Pub/Sub — Redis supports Pub/Sub messaging with pattern matching which you can use for high-performance chat rooms, real-time comment streams, social media feeds, and server intercommunication.
- Lua scripting — Redis allows you to execute transactional Lua scripts. Scripts can help you boost performance and simplify your application.
- Geospatial support — Redis has purpose-built commands for working with real-time geospatial data at scale. You can perform operations like finding the distance between two elements (for example people or places) and finding all elements within a given distance of a point.
Other Points of Comparison:
type of data
- Memcached: only supports String, the data size of key and value is limited, generally 1M
- Redis: The key must be a String, but the value supports rich data types such as String, List, and Set. The data size limit is much larger than that of Memcached, generally 500M to 1G
- MongoDB: adopts a JSON-like structure (bson), supports data types that jJSON can express, such as null, boolean, numeric, string, etc., and the maximum limit for a single document is 16M
- HBase: Use column family, basic data use byte array, each storage block has a data size limit, configurable, at least 64M is supported
performance
- Memcached: stand-alone, multi-threaded, and distributed need to rely on the client to do consistent hashing and other controls
- Redis: Support master-slave synchronization, support cluster, single thread
- MongoDB: The first two are In-Memory, so MongoDB must suffer a bit here, and it may be more appropriate to compare it with MySql. Support master-slave mode, and support horizontal distributed expansion based on sharding
- HBase: The backend is HDFS
Typical use
- Memcached: Accelerates data access as a cache layer for relational databases
- Redis: As the cache layer of a relational database, it accelerates data access; at the same time, it provides a data landing function
- MongoDB: a schema-less database, used to replace MySql in scenarios with fast changes and low transactional requirements, such as the storage of game user information and social information
- HBase: storage of massive data, scenarios requiring high concurrent queries, such as logs
References
- https://redis.io/docs/
- https://redis.com/
- https://aws.amazon.com/redis/ [Amazon Redis]
- https://www.educative.io/courses/complete-guide-to-redis/x1OYkn7B8mB
- https://www.slideshare.net/Byungwook/redis-data-modeling-examples
- https://algodaily.com/lessons/redis-intro-and-use-cases-cheat-sheet
- https://github.com/redis/redis