Redis, Express and Streaming with Node.js and Classic Literature(?)

Kyle
5 min readMay 24, 2015

--

This is part 6 of my Node/Redis series. You can read Part 1, Dancing around strings in Node.js and Redis, Part 2, Store Javascript objects in Redis with Node.js the right way, Part 3 Using the Redis multi object in Node.js for fun and profit, Part 4 Keeping track of account subscriptions with Redis and Node.js, Part 5 Managing modularity and Redis connections in Node.js, Part 7 Untangling Redis sort results with Node.js and lodash and Part 8 Redis, set, node.

Work enough with Node.js you’ll eventually run into the concept of streaming. For some reason, I didn’t feel like I needed to stream anything for a long time. Despite being actually quite simple, it initially confused me and I stayed away for months — maybe I found backpressure spookey. I eventually figured it out and I did some work with AWS S3 and streaming a while back.

Recently, I had a project that stored a whole wad of information in a Redis string. I didn’t have to do any processing so I just pushed it out back through the response in the callback. Using Express, It looked something like this:

app.get(‘/my-route’, function(req,res,next) {
client.get(‘my-data’,function(err,theData) {
if (err) { next(err); } else {
res.send(theData);
}
});
});

While it was fast, it took just a bit longer than I felt like it should. Maybe 5-10ms too slow compared to what I expected. It occurred to me what I was doing was pretty time & memory inefficient. This function waits until my-data has been completely received and stored in a variable and then it sends it out to the user. Could it be better?

Streaming Support

I expected to find some magical node module that is a drop-in replacement for node_redis that streamed all the responses. That didn’t happen. I did find redis-rstream by @jeffbski.

What redis-rstream does is creates chunks using the Redis getrange command. Getrange was actually new to me — I had never noticed it before. The command allows you to get a portion of a string value by key. It is more or less similar to string.slice in Javascript. So, redis-rstream slices up the string value into chunks and pushes the chunks into a writable stream.

It isn’t exactly what I was hoping for, but if you have a huge amount of data stored in Redis as a string, this will let you stream it.

Test setup

In principal, I knew that streaming usually yielded at-least a less latent response. However, sometimes things are so fast anyway, the savings is insignificant. I needed to test my assumption.

For testing, I knew I needed something fairly large. In my use case, I wasn’t storing anything binary. A large text file — where do you turn? For me, it’s usually Project Gutenberg (my wife has a PhD in Literature, somethings just tend to rub off).

First up is Moby Dick. Moby Dick weighs in at 1.2 mb of pure text. I put the text into Redis under the key white-whale. Here is the quick express server I put together:

var
redis = require(‘redis’),
rstream = require(‘redis-rstream’),
express = require(‘express’),

client = redis.createClient(null,null, { detect_buffers : true }),
app = express(),
server;

app.get(‘/normal’, function(req,res,next) {
client.get(‘white-whale’,function(err,mobyDick) {
if (err) { next(err); } else {
res.send(mobyDick);
}
});
});
app.get(‘/stream’,function(req,res,next){
rstream(client,’white-whale’).pipe(res);
});
server = app.listen(4321, function () {
console.log(‘server started’);
});

After starting this server, you can access the non-streamed version by going to http://localhost:4321/normal or the streaming version at http://localhost:4321/stream. Both worked fine after testing them in a browser, but I needed clean results over a big number of tries. First, I wrote a formatter for cURL and saved it as formatter.txt:

%{time_total}\n

This will give me the entire time it takes to retrieve a URL. I’ll put this together into a bash script to loop over it 1,000 times:

#!/bin/bash
for i in `seq 1 1000`;
do
curl -w “@formatter.txt” -o /dev/null -s “http://0.0.0.0:4321/stream
done

By using -o /dev/null I’m just tossing the results and avoiding any CPU cycles to display it to terminal. the -w “@formatter.txt” just applied the above mentioned formatter. I’ll switch out the URL to test the stream vs the normal response as needed. I conducted the test over localhost, to avoid networking interference.

Here are the results:
Normal: 0.01726s Average
Stream: 0.01484 Average

Cool. So it is faster. 16 percent faster to be exact, but this is only a couple of milliseconds faster and is likely imperceptible.

Next up, I decided to up the chunk size to see if that made a difference. By adding chunkSize : 128 to the options on createClient. I’m doubling the size of the chunk being sent. That gave me a time of 0.01299, a difference, but still small.

Bigger test

Understanding that this type of operation will likely be sensitive to the size of the source, I decided to swap out Moby Dick for something larger. Don Quixotie is roughly twice the size at 2.2mb.

I just pushed in the Project Gutenberg version of Don Quixotie into Redis and did the same tests — normal vs streaming, looping 1000 times. The results are more interesting:

Normal: 0.04251s Average
Streaming: 0.02278s Average
Streaming w/ 128kb Chunks: 0.02259s Average

Wow — that’s big. The streaming version is 86% faster. There is such a difference that I’m doubting my results. I suspect that I’m actually running into some sort of node or redis limit here. I also did the test with 128kb chunks, but that is only minutely faster, so again — there must be some sort of limit in the chain.

One final test — what happens if I actually lower the chunk size? I tested the same route with a chunk size set to 32kb — with ol’Don I got a response average as 0.02287s, so a little worse, but not signifigant.

Conclusions

Node is fast. Redis is fast. Streaming is efficient. Node Streaming + Redis Streaming is fast and efficient, but maybe only useful when you’re pushing a lot of data. To be fair, I think most of us are using the more complex data types in Redis and these findings could have limited utility beyond maybe retrieving extra large cached values.

Here is the data for the tests in a Google Spreadsheet.

--

--

Kyle

Developer of things. Node.js + all the frontend jazz. Also, not from Stockholm, don’t do UX. Long story.