Redis and Node.JS Slug out: In-process variables and remotely stored data

Kyle
7 min readSep 7, 2015

--

This is part 15 of my Node / Redis series. The previous part was Love, functional programming, Node.js, Lua and Redis.

Word association time. Look at the word below, what are the first five things that come to mind (leave a comment if you would like):

REDIS

I’m willing to bet one of those five things is memory, as in RAM. It’s one of the defining features of Redis. Since everything is store in RAM and RAM is scarce I think people think in terms of “Will my data fit in my server RAM?” It is a valid concern — but what if I showed you how Redis can save you server(s) RAM?

Node.js, unlike, say, PHP, is reliant on the amount of server RAM. In a LAMP environment, you can host literally thousands of unused web apps on a single server and your real limit is disk space. Since Node.js incorporates the server into the app logic, your app will take up a baseline of server RAM even if it is idle, so you can’t host thousands of unused web apps unless you have a loooooot of RAM. So, like Redis, Node.js can cause some RAM anxiety.

What’s interesting is that Node.js is less efficient at data storage than Redis. Take, for example, a string. Each character is stored as a 16-bit unsigned integer. For whatever reason, I think most coders are under the impression that 1 character = 1 byte — in the case of Javascript you are literally half right. I was reading Redis RAM Ramifications the other day and I took note of how Redis stores strings. Essentially, Redis has roughly 9 bytes of overhead for a string. In theory, if your string is longer than 8 characters, you’ll be better off storing in Redis than in-process in Node. Of course, this ignores things like client overhead, key space, etc, but, on a large scale, long strings are more efficient in Redis. And this is just a simple example — when you start comparing an Object in javascript to the size of a hash in Redis things start to tilt even more in favor of Redis.

This is all fine / good, but we know that not every app is going to be rewritten to take advantage of (at this point) a micro-optimization. Does this space saving have real application?

Onchidiidae

The same day I (re)read Redis RAM Ramifications, I came across 40 NPM Modules We Can’t Live Without. Most of these were well liked and known to me but one triggered a stress flashback. Slug.

I wrote a closed-source blog engine based on Node.js and Redis a few years ago. It works great, but early in the development I would mysteriously see large-ish RAM jumps. I was running everything I could on a single Digital Ocean droplet at the time (Digital Ocean is great, by the way) and RAM was a real stress for the server and me. I thought I had introduced some sort of memory leak somewhere in my code — hours of pouring over the code lead me to Slug.

Slug, is a seemingly simple module that converts the title of a post (“This is my awesome blog entry!”) into URL safe identifiers (“this-is-my-awesome-blog-entry”). Should be simple, right? Well it is, but out of the box it supports unicode symbols. So, “I ❤ Redis” becomes “i-heart-redis”. I had actually touted this to others, so I was kind of stuck with this feature. Slug accomplishes this by using the NPM module unicode. The unicode module, on install, actually downloads the (huge) UnicodeData.txt file, parses it and converts it to Node modules for different unicode categories. Slug in-turn requires the symbol table and uses that to replace the unicode symbols with the slightly tweaked written descriptions of each unicode character.

Since it is using require, it is a synchronous process. The module that contains the symbols are not stripped of extraneous data and each character is a object with a key that is the decimal representation of char code:

The symbol category Javascript file is 1.9mb on disk which baloons up when it is actually represented in-process to about 9mb. I came to this number by evaluating the heap before and after the requiring the symbol module.

Let’s face it — 9mb is big. And think about this at-scale. Let’s say you have many instances of this script running. You’re literally wasting resources. CPU resources are also being sacrificed — loading that gigantic JS file is blocking — so how long is your server dead in the water?

On my machine, I was getting 160,637,042ns, aka 160.63 ms. Blocking for 160ms is not okay, even if it is at process start.

This is no indictment of the author of slug (dodo) or the package — it’s clever and useful, but we can do better given better tools (Redis). Obviously, we will store the values in Redis once, then retrieve them many times, regardless of when the process was started. A hash seems like a good fit and will be more tidy than just strings and more memory efficient. Another nice benefit of storing the unicode table in Redis is that one copy can be shared amongst several Node processes.

Another thing we can do is only store what we need. I get why the original slug module stored everything — to go back through and delete everything that is unneeded would be inefficient at runtime (how many times will a user ever submit a slug with 〠 in it?) and cause garbage collection nightmares. By storing the unicode category in Redis, we can do all the processing ahead of time and have it never touch our app code. So, to create a slug all you really need is the property “name.” The rest is unneeded.

Another optimization is to use the unicode character as the field in the hash. Using the integer representation makes sense in JS, but we can shave a few bytes off by having the field as the unicode itself. It’ll make fetching easy, too.

Finally, the original slug module removed a few words from the names (‘sign’, ’cross’, ’of’, ’symbol’, ’staff’, ’hand’, ’black’, ’white’) we can do this at the time of storing the table to make retrieval as quick as possible

So, what does this all look like in Redis? Here is an example

> hlen unicode
(integer) 5137
> hget unicode ⬣
“horizontal hexagon”
> hget unicode 〠
“postal mark face”

How does it size up? I used redis-rdb-tools to get the actual size of the hash in bytes — 694,405 bytes. Yup. 694k down from 9mb. I’d say that is more than worth the time.

You may be saying — what about the overhead of the Redis module and the new module to get the unicode characters? Using a similar heap / garbage collection analysis, overhead takes up about 625k. So, all in all, you’re talking about 1.3 mb for everything you need. Still beats 10mb.

What about speed?

Here is the not-so-great news. It is quite a bit slower. I’m not exactly sure what is going on inside V8 but repeat calls to the original slug function seem to get faster — I assume this is sort of internal sorting of the object.

This yields me (time to execute first, in nanoseconds):

2480486 ‘this-is-snowman’
222408 ‘this-is-horizontal-hexagon’
66785 ‘this-is-radioactive’
54675 ‘this-is-snowman’
40021 ‘this-is-horizontal-hexagon’
38546 ‘this-is-radioactive’
35885 ‘this-is-snowman’
48876 ‘this-is-horizontal-hexagon’
35714 ‘this-is-radioactive’
42454 ‘this-is-snowman’
35159 ‘this-is-horizontal-hexagon’
88004 ‘this-is-radioactive’
43775 ‘this-is-snowman’
176392 ‘this-is-horizontal-hexagon’
136660 ‘this-is-radioactive’
44393 ‘this-is-snowman’
35619 ‘this-is-horizontal-hexagon’
34767 ‘this-is-radioactive’
37674 ‘this-is-snowman’
34977 ‘this-is-horizontal-hexagon’
69380 ‘this-is-radioactive’
34660 ‘this-is-snowman’
38141 ‘this-is-horizontal-hexagon’
34601 ‘this-is-radioactive’
34083 ‘this-is-snowman’
39259 ‘this-is-horizontal-hexagon’
38534 ‘this-is-radioactive’
34388 ‘this-is-snowman’
34684 ‘this-is-horizontal-hexagon’
36622 ‘this-is-radioactive’

So, ranging from 0.034ms to 2.4ms, and only three above 1ms. Very quick. Now, for the Redis version, the benchmark was slightly more convoluted due to the nature of async (this is quick / dirty, i’m sure there is a more eloquent way of doing this):

Results in (in nanoseconds):

[[ 1136310, ‘this-is-snowman’ ],
[ 1599767, ‘this-is-radioactive’ ],
[ 1639816, ‘this-is-horizontal-hexagon’ ],
[ 1673846, ‘this-is-snowman’ ],
[ 1700406, ‘this-is-horizontal-hexagon’ ],
[ 1729918, ‘this-is-radioactive’ ],
[ 1754942, ‘this-is-snowman’ ],
[ 1783307, ‘this-is-radioactive’ ],
[ 1808572, ‘this-is-horizontal-hexagon’ ],
[ 1835932, ‘this-is-snowman’ ],
[ 1860991, ‘this-is-horizontal-hexagon’ ],
[ 1888511, ‘this-is-radioactive’ ],
[ 1912959, ‘this-is-snowman’ ],
[ 1940373, ‘this-is-radioactive’ ],
[ 1965451, ‘this-is-horizontal-hexagon’ ],
[ 1992372, ‘this-is-snowman’ ],
[ 2017291, ‘this-is-horizontal-hexagon’ ],
[ 2041762, ‘this-is-radioactive’ ],
[ 2069287, ‘this-is-snowman’ ],
[ 2096610, ‘this-is-radioactive’ ],
[ 2121864, ‘this-is-horizontal-hexagon’ ],
[ 2145713, ‘this-is-snowman’ ],
[ 2173316, ‘this-is-horizontal-hexagon’ ],
[ 2203681, ‘this-is-radioactive’ ],
[ 2231493, ‘this-is-snowman’ ],
[ 2256035, ‘this-is-radioactive’ ],
[ 2283552, ‘this-is-horizontal-hexagon’ ],
[ 2307464, ‘this-is-snowman’ ],
[ 2334892, ‘this-is-horizontal-hexagon’ ],
[ 2384459, ‘this-is-radioactive’ ] ]

The best time 1,136,310ns (1.1ms) and the worst was 2,283,552 (2.3ms). Honestly, this surprised me a bit — I was hoping for sub-millisecond.

So, there you have it. Redis is definitively slower than node in-process memory, but it is more space efficient — especially when you figure in the memory sharing between processes and/or servers. The question remains, is it worth the extra millisecond or so to use Redis in a situation like this. I think it is worth it — you’re setting yourself up for the ability to quickly scale. As a bonus, by storing your unicode tables ahead of time, you’ll also shave quite a few milliseconds off your process start time.

I’ve setup a github repo for the port of slug to use Redis. I’m an artist, so it has a very creative name.

--

--

Kyle

Developer of things. Node.js + all the frontend jazz. Also, not from Stockholm, don’t do UX. Long story.