What is Redis and how does it work Internally
Hey there, In this blog we will learn about redis and its internal working.
Caching
Let us start with talking about caching. In the world of computer science and software development, caching is like having a quick-access memory for frequently used information. Imagine you have a favourite snack that you always keep within arm’s reach instead of going to the store every time you crave it. Caching works in a similar way, allowing applications to keep a handy copy of frequently needed data in a special storage area known as a cache.
The main goal of caching is to speed things up. Instead of repeatedly going back to the original source for data, which might be slow or resource-intensive, the application can fetch it from the nearby cache. This quick retrieval helps save time and resources, making the whole process more efficient. One other thing about caching is that we cache only the data that does not change very frequently as that would hinder with the purpose of not having to access the DB frequently.
You may find it weird that why am i discussing why redis before what is redis, hold on and you will get it.
Why Redis ?
Now that we’ve grasped the importance of storing data efficiently for quick access, let’s tackle the question: Why choose Redis as the datastore for caching, and not any other options like MySQL or MongoDB?
The answer lies in how Redis does the job. Redis is designed specifically for speed. It’s an “in-memory” database, storing data right in your computer’s active memory (RAM). This makes retrieving data lightning-fast because it’s already in the workspace where your computer actively operates.
On the other hand, traditional databases like MySQL or MongoDB store data on secondary storage, like a hard drive. While effective for many tasks, fetching data from secondary storage takes more time compared to fetching it from active memory (RAM). It’s like choosing between grabbing a book from a shelf or having a cheat sheet always open on your desk.
Accessing data in memory is significantly faster than performing random disk I/O, contributing to the low latency and high throughput observed in Redis caching. An additional advantage stems from the ease of implementing data structures in memory compared to their on-disk counterparts.
I hope now this gives the answer to why redis and why this question was important.
In the realm of computer memory, registers and CPU cache are the speedsters, faster than RAM. Found within the processor, they operate at lightning speed. However, due to their high cost and limited capacity, they’re like the Formula 1 racers of memory — incredibly fast but reserved for the most critical tasks. For everyday use and storing larger amounts of data, RAM and in-memory databases like Redis take the lead.
What is Redis ?
Quoting the redis docs
Redis is an open source (BSD licensed), in-memory data structure store used as a database, cache, message broker, and streaming engine. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams. Redis has built-in replication, Lua scripting, LRU eviction, transactions, and different levels of on-disk persistence, and provides high availability via Redis Sentinel and automatic partitioning with Redis Cluster.
Now, for a deeper dive into Redis’s capabilities and usage, the Redis documentation is your go-to resource. You can explore data stores, learn how to harness the power of Redis. We’ll discuss data persistence in Redis in some other blog.
An intriguing fact about Redis: It operates on a single-threaded model. Surprising, right? Stick around as we uncover the mysteries behind Redis’s remarkable performance, including how it thrives with just one thread.
Single-Threaded Redis
You might be wondering: why would Redis, opt for a single-threaded design? Wouldn’t it be more efficient to utilise multiple threads and tap into all available CPU cores for parallel computation?
The answer lies in the nature of Redis’s workload. Redis is like a chef in a computer kitchen, handling one task at a time to ensure perfection. While many helpers might sound good, for Redis, it prefers a single thread for precision, avoiding chaos and ensuring each task is flawlessly executed.
While many applications benefit from parallel processing, Redis primarily focuses on quick data access and minimal latency. A single-threaded approach simplifies the design, ensuring that commands are executed sequentially without the complexities of managing multiple threads and potential synchronisation issues.
In Redis, each command is executed atomically, guaranteeing consistency. This simplicity contributes to faster execution, as the single thread can fully utilise the CPU cache, minimising cache misses and optimising performance.
In a multi-threaded environment, threads share certain resources, including the CPU cache. When one thread updates data, it may cause the cache to be updated or invalidated. If another thread was using the same data or related data, it might experience what is known as a “cache miss” because the data it needs is no longer in the cache or has been modified.
This situation creates what we call cache contention or cache thrashing, where threads are competing for access to the same limited cache space. Instead of a smooth flow of data, you get interruptions and inefficiencies as threads contend for access, leading to potential performance bottlenecks.
Okay, we’ve grasped why Redis opts for a single-threaded approach. But here’s the intriguing question: How does a single thread manage many thousands of incoming requests and outgoing responses simultaneously? Wont the thread get blocked in completing each request individually?
Managing Thousands in Harmony
Here’s where I/O multiplexing comes into the picture — it’s the secret sauce that helps Redis achieve apparent concurrency. Seems like a buzzword, doesn’t it? Let me break it down for you.
Here are the steps
- Redis Server is Running at Some Address: Redis, like a diligent worker, is up and running, ready to handle requests and share its data.
- Redis Accepts Multiple TCP Connections Through Clients: As various clients, which could be applications or services, want to interact with Redis, they establish TCP connections. Now a network socket is established between redis and the client. A network socket is a virtual communication channel between the client and Redis. It’s like a two-way street where data can flow in both directions.
- Without I/O Multiplexing: To read data from a network socket, Redis initiates a read() system call . This read system call, functioning as an I/O operation, is characterised by its blocking nature, meaning Redis’s single-threaded process will wait on the associated TCP connection until data becomes available to read. This data typically consists of requests and corresponding data sent by clients to Redis. This blocking behaviour implies that Redis, with its single-threaded nature, would be confined to processing only one TCP connection at a time. The Redis server’s thread would wait on a specific client’s connection until data is ready to be read and thus there would be no point of accepting multiple connections.
Does any solution of this problem come to your mind : can we not do something that we should fire a request to read only when we know that the data is ready to be read and thus avoid the blocking nature , can we not receive some sort of alert that hey your data is ready please read and process. You are right this is exactly what I/O multiplexing does.
I/O Multiplexing
- I/O multiplexing allows Redis to monitor multiple connections simultaneously without blocking its main thread. Instead of waiting for data on a single connection, Redis can keep an eye on multiple connections at once.
- Redis uses the
select()
orpoll()
system call to register interest in multiple sockets (connections) simultaneously. These calls allow it to specify a set of sockets it wants to monitor for specific events, such as readiness to read. - This
select()
orpoll()
system calls fall under the umbrella of I/O monitoring system calls. - Redis’s single thread invokes the
select()
orpoll()
system call and enters a state of waiting for events. During this time, Redis is not actively processing any specific connection; instead, it awaits notifications about events on the registered sockets and process the requests from the ready sockets one at a time.
Here you might say that if it is processing requests one at a time only then how is the efficiency achieved, Redis beautifully exploits the fact that network I/O is much time taking then Redis’s in-memory operations (which are atomic) and thus redis can provide high throughput, low latency and this apparent but performant concurrency.
- When an event occurs on any of the registered sockets (e.g., data becomes available for reading), the
select()
orpoll()
call returns. The return value indicates which sockets experienced events, allowing Redis to identify where data is ready to be processed. - Redis then proceeds to handle the specific events on the sockets identified by the system call. For example, if data is ready to be read on a particular socket, Redis can initiate the read operation without waiting, addressing the blocking nature of traditional I/O.
- The event-driven approach is asynchronous; Redis doesn’t actively poll each socket but rather responds to events as they occur. This allows Redis to efficiently manage a large number of connections without wasting resources on constant polling.
- By waiting for events rather than blocking on individual sockets, Redis maximizes the utilisation of its single thread and system resources. This ensures that the server remains responsive to events across multiple connections without unnecessary delays.
I/O Monitoring System Calls
I/O monitoring system calls are functions provided by the operating system that allow your program to keep an eye on multiple input/output sources, like file descriptors or network sockets, at the same time.
Think of I/O monitoring as setting up a watchman for your program. You tell the watchman what you’re interested in (readiness to read, write, etc.), and then your program takes a break, letting the watchman keep an eye on things. When an event happens, the watchman signals your program, saying, “Hey, something’s ready!” Your program then jumps back in to handle the specific event.
Hope you liked it, Drop a 👏 and do follow for more content