Redis 7 Internals Series
What this is all about
Recently I decided to have a look at Redis source code. At that time, I couldn’t think that it will get me hooked for a good couple of months, so I wasn’t really too much picky about choosing any specific version. I just cloned current dev branch. It’s stable — I mean, the code works, tests pass, but this is a work in progress: probably that’s the reason why the branch name is, uhm, “unstable”. However, everything covered here applies to Redis 7, except some lower-level details. More specifically, the commit I’m referring to throughout this series is ahead of Redis 7.0.X versions, and is part of recently released Redis 7.2.0 version. Oddly enough, there is no such thing as Redis 7.1.
What is covered, and what is not
My primary purpose is to give a higher-level overview of how Redis works — that’s I think the most important part when diving into a new codebase. Being a business applications developer by day, I had absolutely no idea how such kind of systems work internally. And that’s the overview that I personally wish I had when I started this endeavor.
Besides a higher-level overview, there are some parts that I’ve found particularly interesting. I’m describing those too. But there are way more interesting things that I’m not diving into here. However, having an overall picture, it won’t be much of a trouble to do oneself.
Originally, this overview was in a half-visual-half-textual form in this Miro board. Current series is, at least to some extent, a complementary afterthought. Comparing those two, I think that the Miro version allows to see better a bigger picture, and by its very nature it’s more visual. While accompanying posts makes an effort of explaining some parts in more depth.
Scenarios
I think that the best way to describe how the system works is through user stories. As a side note, no wonder that I also think that the best way to decompose any new feature is through writing and discussing user stories as well. Moreover, I prefer to discuss them through specific examples which might serve as BDD-style tests. That’s what I’ve tried to apply here. There are several specific scenarios I consider, some are small, some are large. Within these scenarios, when applicable, I’ve tried to describe lower-level functionality by a concrete example.
The initial post delves into the processes set in motion when Redis initiates. This piece is quite substantial, and understandably so, given the multitude of activities transpiring during this phase. Redis establishes listening sockets, initializes background threads and threaded I/O machinery, and finally throws a party with an event loop — essentially the heartbeat of Redis. Alongside the execution of client commands, it manages various cyclic tasks: client eviction, key expiration, defragmentation, maintenance procedures for hash tables like resizing and rehashing, as well as data persistence.
The following post outlines the way Redis accepts client connections. There I examine the intricacies of listening sockets, their configuration, and how they generally function. If you get fascinated by this whole network-thing, you might also like delving directly into the implementation of TCP connection establishment within the Linux TCP stack.
The next post explores the process in which Redis reads and parses a command, assembles a response, and transfers it into a socket. Furthermore, I delve into how Redis manages responses based on their size, detailing its approach to handling different reply sizes.
The fourth post takes a lot of cues from the previous one and zooms in on threaded I/O. It’s a deep dive into I/O threads — what they are, when they come in handy, and how they’re put into action. I’m also digging into five back-to-back event loop iterations to describe in detail how I/O threads jump in and out of the game.
Lastly, I’m going to detail how Redis goes about carrying out a command. In that section, I’ll guide you through the primary data structures that Redis leans on for storing data. Then, I’ll dive into the necessary maintenance procedures that kick in with every command. This will include revisiting hash table resizing and rehashing, plus breaking down the nitty-gritty of key eviction. To wrap things up, I’ll sketch out the general steps that a command execution follows.