The Road to Redis — Chapter 1

From tables to hash

Note: This series was originally titled “From SQL to Redis.” You can read about why it’s been changed to “Road to Redis” here.

In this series we’ll use a fictional e-Commerce store that sells mugs and shirts imprinted with artwork. I’ve built a medium-sized sample set in JSON based on resplashed.com, a free image site. I used the Node.js GM module to composite the images. The meta data on these images, aside from the artist attribution, is made using Faker.js.

Just a note about the exercises and examples in this series. We will be iteratively building up techniques, so between chapters I suggest you FLUSHDB. Consequently, if you have any other data in your Redis instance, it would be wise to use an alternate instance.

Understanding the source

“Tables” are a common metaphor seen in many different applications — like a spreadsheet, they store data in rows and columns. In most relational databases, you have a pre-defined data type for each (pre-defined) column and the number rows that is only limited by your storage. It’s a nice, neat grid.

Indeed, if you’re like me you visualize how the table is setup on disk. If you have two columns, the first with a data type that takes 100 bytes and the next taking up 4 bytes, you think about each row occupying 104 bytes and then the next row going for 104 bytes and so on. The reality, of course, would be different and varies by database product, but the general gist is that each row takes up a predictable amount of space.

The spanner in the works is that there are so-called “blob” data types that allow for large amounts of unstructured data. These are represented as a separate space on disk and the row only contains a reference to the blob, not the data itself.

So, as you start building out an application based on a relational database, you need to determine the basic layout of your data prior to starting. While changing the structure of your database is possible, it is not without peril.

Out of the gate, every column is treated equally. With SQL you can write queries that search data on each column in a somewhat equal way. You can but maybe not a good ideal as your data grows. The most used columns you set as an index that orders your data and makes referencing by this index more efficient. This leads to the next topic, queries.

SQL is the language that you use to retrieve your data. If you want to get row or rows, you specify that in a SELECT query. SELECT queries are a gigantic topic in-and-of themselves, but just consider it to be the primary way to get a row or many rows.

Enough description of RDBMSes and SQL- if you’re here you likely already know all of this. Let’s take a look at a table with the eCommerce example. The table is structured like this:

It’s a pretty straight forward table. We’re using a few different data types:

  • varchar is being used to store short amounts of text. The length is 100 characters because, well, that’s what I thought would be a reasonable size.
  • decimal is being used to store the price. In setting up the table, a choice had to be made between decimal and float as each type has subtly different uses and pros/cons. Since this is currency, decimal is generally considered to be the appropriate data type.
  • text is for the item description. In our example, the description of the item could be quite long, so the choice was made to use text vs a very large varchar as you can increase the amount of rows that can be stored per block.
  • datetime is being used to store the time the item was added. This data type enables sorting, filtering and formatting of the date in the query. While we could have stored the date/time in a pre-formatted structure, you would loose quite a bit of functionality.
  • enum stores the two types of items the store has — while we could have used a varchar for this, or heck even an int, the enum made the most sense at the time of developing the structure.

A couple of other notes about the data structure. I’m using slug as the unique/primary index. This is how the store primarily identifies each item, but it well could have been an numeric ID or SKU number. manufacturer is a slug that would be related to another table that has information about the maker of each item (more on that in a future installment).

Even with our simple table, we’ve had to make many choices. Some of these choices involve knowing some pretty obscure workings of how the database system works (I’m looking at you text vs varchar) as well as assumptions about our data (100 character limits on the varchar).

So, to create the table, you’d execute something like this:

CREATE TABLE `items` (
`artist` varchar(100) NOT NULL,
`price` decimal(10,2) NOT NULL,
`name` varchar(100) NOT NULL,
`description` text NOT NULL,
`slug` varchar(100) NOT NULL,
`added` datetime NOT NULL,
`manufacturer` varchar(100) NOT NULL,
`itemType` enum('shirt','mug') NOT NULL,
`productImg` varchar(100) NOT NULL
);
ALTER TABLE `items`
ADD UNIQUE KEY `slug` (`slug`);

This, actually, brings us to Redis. Let’s talk about how you create a similar structure with Redis:

Yep. It’s empty. In Redis you don’t pre-define anything. It feels very loosey-goosey at first, but you really don’t need it. Seriously! You do, however, still need to think about how your data would be laid out. That is to say, how would name your fields and what kind of data structures to use, but Redis lacks any sort of predefinition of data.

While Redis is not just a key-value store, I would say it descends from key-value and informs how you think about data in Redis. In this case, we aren’t going to collect all the “rows” into a single structure, but rather we will be collect them by a common key pattern. In Redis, by convention, you use colons (“:”) to separate parts of a key. The first part is the “root” of the keyspace — we’ll use “redishop.” The second part is roughly equivalent to our table name in SQL (“items”) and the third part is our slug. So, for a row in SQL for an item with the slug “Handcrafted-Trees-Mug” then the equivalent Redis key would be:

redishop:items:Handcrafted-Trees-Mug

I should, at this point, let you know that this is all by convention — big apps can have long, complicated key structures and there is no practical limit to the size of your keys (although, I’d say that gigantic keys are far from a best practice). Even the colon as a separator is optional — you can make it anything you want.

Since the key and parts are by convention only, there is only an implied hierarchy to each part of a key. Visually, you could make the assumption that the keys represent some sort of tree — that assumption would be incorrect. Each key in Redis is a peer to any other key. A human looks at the following keys and can understand that redishop is the root, items is distinct from users and so on:

redishop:items:Gorgeous-Water-Mug
redishop:items:Handcrafted-Trees-Mug
redishop:users:kyle

However, all three are in a flat namespace.

In our example, each item (redishop:items:*) will be of type “hash.” The term hash is used in many contexts, but in this case we mean hash as in hash table. At their most basic, Redis hashes consist of three parts:

  • A Key,
  • Fields,
  • Values.

In SQL terms, you can think of a single row in a table with a single unique (key) column and many unindexed columns (fields) with values. A few caveats: each field must have a corresponding value — it’s impossible to have a truly empty field in Redis and a hash with no fields ceases to exist.

So — let’s take a row from the SQL table and see how you might express the same thing with a Redis hash. Take this SQL:

INSERT INTO `items` (`artist`, `price`, `name`, `description`, `slug`, `added`, `manufacturer`, `itemType`, `productImg`) VALUES
('Martin Wessely', '10.99', 'Handcrafted Trees Mug', 'enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur', 'Handcrafted-Trees-Mug', '2017-01-29 21:02:46', 'OHara-Group', 'mug', 'mug-400_002e1ecb8bd2.jpg');

Now, with Redis, you would execute a few commands to do the same thing:

> HSET redishop:items:Handcrafted-Trees-Mug artist "Martin Wessely"
> HSET redishop:items:Handcrafted-Trees-Mug price 10.99
> HSET redishop:items:Handcrafted-Trees-Mug name "Handcrafted Trees Mug"
> HSET redishop:items:Handcrafted-Trees-Mug description "enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur"
> HSET redishop:items:Handcrafted-Trees-Mug manufacturer OHara-Group
> HSET redishop:items:Handcrafted-Trees-Mug itemType mug
> HSET redishop:items:Handcrafted-Trees-Mug productImg mug-400_002e1ecb8bd2.jpg

HSET takes three arguments — key, field, and value. Another difference between SQL and Redis is that while SQL makes a distinction between adding a new row (INSERT) and updating a row (UPDATE), Redis doesn’t care. You are just “setting” something.

Notably absent from the above Redis commands are the slug and the added columns from the SQL. For slugs, the slug is present in the key itself, so no need to store it in multiple places. added is more interesting — we’re going to deal with that one in a later installment.

I used HSET above to illustrate the command, Redis has a more powerful command available for inserting multiple fields in at one key. HMSET allows you to specify a single key and multiple fields / values. So, the above example gets a little more concise (if, at the sake of readability):

> HMSET redishop:items:Handcrafted-Trees-Mug artist "Martin Wessely" price 10.99 name "Handcrafted Trees Mug" description "enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur" manufacturer OHara-Group itemType mug productImg mug-400_002e1ecb8bd2.jpg

Now, we have this single hash at a key with several fields/values in it. We could go ahead and add tons of more items but there is nothing to tie them together. If you want to do anything more complex than just grabbing a single value, we’re going to need to amp it up a bit with some more data structures.

The next data structure I want to leverage is the set. Sets are collections of strings that are unordered and cannot contain duplicates. This will allow us to provide a non-implied connection between all the item hashes. So, we’ll have a key and then we’ll add the key of the hash every time we create a new item. We can add a value to a set with the SADD command which takes two arguments: key and value. This means we’ll need to execute a couple of commands every time we add a new item:

> hmset redishop:items:Handcrafted-Trees-Mug artist "Martin Wessely" price 10.99 name "Handcrafted Trees Mug" description "enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur" manufacturer OHara-Group itemType mug productImg mug-400_002e1ecb8bd2.jpg
> sadd redishop:all-items redishop:items:Handcrafted-Trees-Mug

If you execute these two commands as-is, it’s possible that another Redis client could expect to find both the hash and the set causing undesirable results in your program. To ensure that no other client elbows in on our action, we can contain it in a MULTI/EXEC block. A MULTI/EXEC block queues up all the commands after MULTI and before EXEC. After the exec command is issued, the server will execute everything in rapid sequence without intrusion. This is how it looks now:

> multi
> hmset redishop:items:Handcrafted-Trees-Mug artist "Martin Wessely" price 10.99 name "Handcrafted Trees Mug" description "enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur" manufacturer OHara-Group itemType mug productImg mug-400_002e1ecb8bd2.jpg
> sadd redishop:all-items Handcrafted-Trees-Mug
> exec

This illustrates the concept of atomic operations. All single commands in Redis are atomic — they can’t be interrupted. The MULTI/EXEC block creates an atomic queue of commands. That being said, you can treat the entire queue as one command, enabling powerful sequences over multiple keys and data types.

Now that we have the keys all tied to a single structure (redishop:all-items) you are able to do more useful things with the entire collection using the SORT command. The SORT command may be the most complicated of all the Redis commands, but it’s also very powerful. Let’s build up a SORT command.

> SORT redishop:all-items BY redishop:items:*->price

We are passing our set of items and passing it in on the first argument. In the third argument we’re using an asterisk to stand in for the item being sorted in the set. This would return a list of set members sorted by the price from the hash:

1) "Awesome-Architecture-Mug-1"
2) "Awesome-Car-Shirt"
3) "Awesome-City-Shirt-2"
4) "Awesome-Ocean-Shirt-2"
5) "Awesome-Snow-Mug-1"
...

Awesome! But what if we want to get that price? How would we do that? The SORT command can also GET items from a hash.

> SORT redishop:all-items BY redishop:items:*->price GET redishop:items:*->price

This would return:

1) "9.99"
2) "9.99"
3) "9.99"
4) "9.99"
5) "9.99"
...

Not terribly useful — let’s also GET the element itself:

> SORT redishop:all-items BY redishop:items:*->price GET # GET redishop:items:*->price

Which returns the element and then the price for each one:

 1) "9.99"
2) "Awesome-Architecture-Mug-1"
3) "9.99"
4) "Awesome-Car-Shirt"
5) "9.99"
6) "Awesome-City-Shirt-2"
7) "9.99"
8) "Awesome-Ocean-Shirt-2"
9) "9.99"
10) "Awesome-Snow-Mug-1"
...

You might notice that Redis is just alternating between the element and the price (this is called an “array reply”). Redis assumes you know the order of the elements and it doesn’t make a dividing line between each key— you’ve got to do that yourself.

This is fine if you only have a handful of elements, but what if you want to only retrieve a small number of elements? Things start to look a lot like SQL:

> SORT redishop:all-items BY redishop:items:*->price GET # GET redishop:items:*->price LIMIT 0 5

This would get the first 5 elements. Consequently, you could get the next five by doing this:

> SORT redishop:all-items BY redishop:items:*->price GET # GET redishop:items:*->price LIMIT 5 5

You can also control the sort order in a very SQL-like way — using DESC or ASC:

> SORT redishop:all-items BY redishop:items:*->price GET # GET redishop:items:*->price LIMIT 5 5 DESC

One thing that you won’t find in the Redis SORT command is the corollary of the “SELECT * FROM…”. If you think about how Redis returns the items and there is no imposed structure on a hash, it wouldn’t make much sense.

A note on complexity: SORT is not only syntactically complex, but also computationally complex. Paging using SORT alone means repeating the complex SORT operation over the entire set for each page. Later in this series we will explore other methods to achieve the same results in a more efficient manner.

Where is where?

So far, we’ve covered how to retrieve a list of items from our hash and set. But what about if you want to only get items that have a specific value or a range of values that isn’t the slug? How do you accomplish this with Redis?

First we need to rewind a bit and look at how we would get our data into Redis. Up to now we’ve used HSET and SADD in a MULTI/EXEC block. Let’s add a bit more to our MULTI/EXEC.

> multi
> hmset redishop:items:Handcrafted-Trees-Mug artist "Martin Wessely" price 10.99 name "Handcrafted Trees Mug" description "enim corporis voluptatibus laudantium possimus alias dolorem voluptatem similique aut aliquam voluptatem voluptatem omnis id consequatur" manufacturer OHara-Group itemType mug productImg mug-400_002e1ecb8bd2.jpg
> sadd redishop:all-items Handcrafted-Trees-Mug
> zadd redishop:priceIndex 10.99 Handcrafted-Trees-Mug
> exec

Here we’ve added the ZADD command. ZADD allows you to add an element to a sorted set. Like a set a sorted set only allows members without repeats, but also allows you to specify a score. In this case, the score is the price of the item. So, let’s say you wanted to get the items between 15 and 17 dollars. With our newly minted sorted set (“zset”) then we can run the following command:

> ZRANGEBYSCORE redishop:priceIndex 15 17 WITHSCORES

Which returns:

 1) “Awesome-Books-Shirt”
2) “15.99”
3) “Awesome-Building-Shirt-1”
4) “15.99”
5) “Awesome-Fog-Shirt-4”
6) “15.99”
7) “Awesome-Lake-Mug-1”
8) “15.99”
9) “Awesome-Sun-Mug”
10) “15.99”
...
545) "Unbranded-NASA-Mug"
546) "16.989999999999998"
547) "Unbranded-Nature-Mug-1"
548) "16.989999999999998"
549) "Unbranded-Office-Mug"
550) "16.989999999999998"
551) "Unbranded-Trees-Shirt"
552) "16.989999999999998"
553) "Unbranded-Water-Mug-1"
554) "16.989999999999998"

This is just returning the “slug” and the “score” (a.k.a. price). That might be enough for your application, but it’s actually possible to get values from the hash while constraining your results by a zset. As you might have guessed, you’ll need to use a MULTI/EXEC block.

First up, what you’ll do is make a copy of the entire zset — out of the box, Redis doesn’t have a true “duplicate” function but we can simulate by making a union with… nothing! Redis is very fast and this will be zippy on many data sets, but it might be slow on very large sets.

> ZUNIONSTORE temp 1 redishop:priceIndex

Using the ZUNIONSTORE command this way, you basically say “At the key temp make the union of the 1 zset at key redishop:priceIndex.” After running, the key temp will contain an identical copy of the contents of redishop:priceIndex. With that we can start trimming out unneeded items with ZREMRANGEBYSCORE.

> ZREMRANGEBYSCORE temp -inf 15
> ZREMRANGEBYSCORE temp 17 +inf

Now, if you were to run:

ZRANGE temp 0 -1 WITHSCORES

would be equal to:

ZRANGEBYSCORE redishop:priceIndex 15 17 WITHSCORES

We need to use SORT on the temp key as before and clean the temp key.

> MULTI
> ZUNIONSTORE temp 1 redishop:priceIndex
> ZREMRANGEBYSCORE temp -inf 15
> ZREMRANGEBYSCORE temp 17 +inf
> SORT temp BY redishop:items:*->price GET # GET redishop:items:*->price
> DEL temp
> EXEC

This will deliver 5 nested responses — one for each item in the MULTI/EXEC block. We only really care about the 4th one, the result of the SORT. Now, you might look at that and think: “You’re moving quite a bit around, is there a better way?” That depends on what you need. If this type of read is not critical to be atomic, you could just do the original ZRANGEBYSCORE and bring the response into your application and re-request the values with HGETs. It’s more round trips but it might be better, depending on the size of your data. Another option is to use a better key name for temp not immediately delete it (maybe let it EXPIRE if not refreshed) — this would be especially handy if you’re doing something like paging.

In a script

SQL can be and is sometimes used as a tool by itself, or it can be used as a storage engine for an application. Redis, on the other hand, is designed to be used as storage behind another application. That being said, let’s see how to translate the above techniques into code. In this case, we’ll be using Node.js and the node_redis module. The node_redis module keeps very close to the Redis API, so it’s fairly evident what is going on even if you’ve just been introduced to the API.

All of the scripts require you to run the commands with an argument called “credentials” which is the path to a JSON file that contains the node_redis connection information (you should specify a server password, at very minimum)

First, let’s add the items to Redis.

In this script, we’re starting the Redis client, reading in the sample items from JSON then executing a series of atomic MULTI/EXEC blocks consisting of HMSET, SADD and ZADD. We’re doing this one full “item” at a time. I’m using async.each which actually doesn’t wait for a return, but since Redis is single-thread, it’ll functions as a series. After all the MULTI/EXEC calls have returned back, we’ll get the final function and we’ll see the “Items added.” log. We also have to quit the Redis client or the script will just wait for more.

Next up, let’s take a look at sorting our list.

This script starts out very much like the previous, but we have an extra dependency (the ever-useful lodash). The primary functionality of the script resides in the sortedValues function. We are executing a single command — SORT. SORT is about as complicated as the Redis API gets as it has a series of “sub-commands” that change the behaviour. In this example, we’re using the sub-command GET, which can retrieve values from external keys and BY which determines the sorting field. The client libraries treat each part of the sub-command as an individual argument, so I’ve grouped them logically on single lines.

To deal with the way Redis returns the values, I’m using the lodash chunk function. This splits up the long, flat array into nested arrays at specified intervals. Very useful! I’m also displaying the results in a text-based table, just for a little ease of use.

You might also notice the perf command line argument mentioned. We’re measuring the speed of the sub-routine in a couple of ways (see below).

When running this, you’ll get a ton of text (1700+ rows in a ASCII table), so I suggest redirecting it to a file that you can review in a text viewer or editor.

Let’s take a look at the code that returns items in a specific price range. I’ve written this in both an atomic and non-atomic fashion so you can compare the two techniques.

In this code, we’re very much keeping with the same structure as the sorted items code, but we’re using a MULTI/EXEC block to create our temporary duplicate, make changes and read back the results. One thing particular note here is that allResponses holds all the responses to the 5 commands in the MULTI/EXEC, but we actually don’t need the other 4 — just the 3rd item is of importance.

The above code can be re-written to be non-atomic. While atomic operations are generally considered preferable, there are cases where a non-atomic version would be useful (reducing server load, applying special app-side logic, etc.) Let’s look to see how it’s structured:

In this process, we’re first fetching a range from a zset by score, then we are returning those partial keys to the app logic. At this point, we’re turning around and requesting the specific fields from the Redis server. A glaring disadvantage is that it is pushing quite a bit of data out of Redis into Node.js and back into Redis. This requires overhead for both the network (even if it is localhost) and for the serializing and deserializing of the data.

Relative Performance

You might have noticed that the above code contains console.time / console.timeEnd as well as argv.pref. These items are here to measure the relative performance through these three related examples. To measure these results, I repeated it 25 times through a bash script. Both the Redis server and the Node.js were on the same machine (a normal MacBook Pro development machine). The results may surprise you.

The sort example returns 1,740 rows and the between range examples (atomic and non-atomic) returned a small-ish response with 433 rows. All examples returned the same three fields — slug, name and price.

First, let’s look at the atomic vs non-atomic. To be honest, I had assumed that with a data-set this size, the non-atomic example would win out just due to the complexity of the atomic example (which needs to duplicate, trim and sort).

Between Range: Atomic

Minimal Redis Response Time (Mean): 8.81ms
Minimal Redis Response Time (Median): 8.49ms

After Node.js Response Time Processing (Mean): 12.3oms
After Node.js Response Time Processing (Median): 11.87ms

Between Range: Non-atomic

Minimal Redis Response Time (Mean): 17.41ms (1.97x slower)
Minimal Redis Response Time (Median): 17.40ms

After Node.js Response Time Processing (Mean): 20.68ms (1.68x slower)
After Node.js Response Time Processing (Median): 20.623ms

Very interesting! A huge difference — it’s very much faster to do it all with an atomic operation vs two calls (non-atomic).

Now, let’s see how our sort call performs. The SORT, while having multiple “sub-commands” is actually a single call for Redis and is getting the fields from the Hash as part of the one call. I expect it to be speedy, but on the other hand, it has to send a huge amount of data — it’s not filtering anything, so we’ll get all the results for the whole dataset.

The other downfall to non-atomic operations that I haven’t included in this comparison is that with non-atomic operations, a non-trivial chance exists that between getting the range and returning the data, the data may have been deleted or changed. This can take many shapes, minimally, you need to make sure that you are not returning null results (or your end logic can handle those).

Sort

Minimal Redis Response Time (Mean): 13.64ms (1.54x slower)
Minimal Redis Response Time (Median): 13.39ms

After Node.js Response Time Processing (Mean): 20.65ms (1.67x slower)
After Node.js Response Time Processing (Median): 20.28ms

Amazing! Despite returning more than 4x the amount of data, SORT is faster than the between range non-atomic version. You can, however, see quite a bit of overhead in processing all that data with Node.js.

Here are the raw results:

What have we learned?

In this chapter, we’ve learned how some of the basic concepts of a SQL table can be roughly mapped to a Redis environment. In Redis, a table doesn’t map to a single structure, but rather a more flexible and precise group of structures.

We’ve also took a look at how you would implement these commands in a Node.js environment. We saw how the Redis API can be represented in the fluent syntax of the node_redis module. In addition, we touched on the asynchronous implications of making multiple calls.

Finally, we saw how the different techniques perform under different conditions. In this performance benchmark, we saw how making multiple round-trips to the Redis server can me much slower than making a single call or a MULTI/EXEC block.

In the next installment we’ll take a look at how we can make one-to-many relationships in Redis.

You can pick up the examples from this chapter at the github repo.

Read Chapter 2: One to Many Relationships.