That syncing feeling

Redis without callbacks in Node.js using pub/sub and getters/setters

This is part 26 of my Node / Redis series. The previous part was The Wonderful World of HyperLogLog.

For some programmers, the concept of asynchrony is really hard. It’s not their fault and it doesn’t represent some sort of lack of intelligence or anything like that, I feel like programmer education stresses many things but assumes that line 47 will always execute after line 45. At any rate, Node.js is absolutely chock-a-block with asynchrony and it gets even hairier when you start throwing in a data store. Let’s see what we can do to craft code that is straight-forward to write and understand while still being able to leverage Redis.

asynchrony, man. asynchrony.

Redis, in contrast with many other data stores, is fast. If your entire Redis round-trip is taking 1 ms it’s hard to “feel” this as a programmer. If you’re working with MySQL, you might high-five your team for writing a query that only takes 250ms; at this scale you can actually perceive the performance cost of a query when testing. Still, even with a crazy-fast Redis round-trip, it isn’t even close to being able to return a value before the VM gets to the next line in code, so you’re stuck with callbacks or promises.

Ford River Rouge plant, circa 1927 (Wikimedia — Public Domain)

The River Rouge Experience.

The Ford River Rouge Complex plant in Dearborn, Michigan was famous for taking raw materials and converting them into cars. Imagine stuff like rock (iron ore) and tree sap (raw latex) being off loaded and, at the other end, a Model T drives away. Of course, automobile manufacturing doesn’t work that way anymore — but I would have liked to have seen and maybe have worked there. Why? Because with that range of activity you could have learned so much. Attaching a fender and boiling latex are so vastly different yet it offered the opportunity to absorbed these disparate skills in one location. I don’t know if that’s how it really worked, but that’s how I think of it.

Doing full-stack development of web apps is somewhat akin to River Rouge. You’re forced to learn front-end and back-end, database and styling (I could go on). Occasionally, I’ll learn something in one aspect of full-stack work and think “this is so useful, why doesn’t X have it?”

Recently, working on an Angular 1.x project I was thinking about how seamlessly the view and the controller shares data. Nothing is really manual — if you set a variable in the controller, it’s present in the view (and vice versa). No manual polling, no requests, it just “is.” What if communicating between a Node.js server and Redis was like this?

Tree Sap to Tires

The big difference in what Angular is doing and what I’d like to do is that the view and the controller are much “closer.” The DOM and JS exist on the same machine in the same thread (generally) and are highly integrated. This is not the case with Redis and Node.js. They may exist on the same machine or they may not, but they are connected over TCP. This means there is a round-trip time which complicates things ever so much.

My first thought was using getters and setters. Getters and setters are ways to define functions that do the work of setting a property of a Javascript object. If you define a getter and a setter, you can use the normal assignment and comparison syntax:

myObj.getterSetterProperty = 45; //no problemo
if (myObj.getterSetterProperty === 45) { console.log('this works too'); }

In this case getterSetterProperty can have a function that calculates the value for both setting and getting. The problem with this is that it’s inherently synchronous. No room for a callback and once you start thinking about it (it would be strange if it did). With no callback, how can you leverage it to GET or SET something in Redis?

To setup a getter / setter, you can use Object.defineProperty. Object.defineProperty is from ES5. If you’ve been reading my stuff for long, you might have noticed that I tend to write Javascript without many of the new syntax wiz-bang features. I do this not because I’m a luddite, but because I want to write very understandable and clear example code. I also try to stay away from coding with any sort of transpilers with my writing— it’s an extra step and only a slice of the community might be using it. Now, do I use them sometimes, sure. Just not here. In the case of Object.defineProperty, it’s just ES5 and this is a Node.js script, so run some thing greater than Node.js 0.10 which I think pretty much everyone is (and should be!) running in late 2016. Here is a short example of how it works:

Object.defineProperty(
yourObject,
yourFieldName,
{
set: function(assignedValue) {
/*do something with assignedValue */
},
get: function() {
return something;
}
}
);

So, thinking about this and Redis, you could run a Redis command in the set function and use and set a value to the local variable, but this would only replicate the state of your script to Redis. If another script (or instance of the same script) uses the same technique, two processes would happily go about ignoring each other as it’s writing, but not reading. So, we need a way to transparently get and keep the value of our object property up-to-date with the Redis value.

Yup. Pub/Sub. So, if we use keyspace notifications and Pub/Sub, we can keep the values up-to-date. So, when you define the property, the script can subscribe to that key’s changes. When the Redis value changes, you change the value in your script.

This does introduce a snag though — you get, in effect, eventual consistency. Eventual consistency is most often seen in distributed setups — in short, you can’t rely that the value you just wrote will be immediately available nor up-to-date. This is a side effect of the Redis round-trip time (however short it may be). It’s scary at first, but, in most cases, you can design around it.

In our example, we would be in a world of hurt if we tried to do a for loop with a variable that was managed this way. However, if you had something that was being displayed to the client and it wasn’t super critical to have the most up-to-date information absolutely immediately after it changed, it’s a big win.

One other thing we need to manage is error handling. With callbacks or promises, it’s easy to feel like you’re dealing with error handling more than success handling (sometimes I feel like Node.js’ motto should be “The programming language that puts errors first!” I’ll be here all week, tip your waitresses). You could, uh, ignore all the errors which would be optimistic to put it lightly but what you really need is a way to communicate those errors back to the script. When I started toying with the idea, I was just logging them to the console, which is not much better than ignoring, but if you keep eventual consistency in mind, you must always assume that the data is out of date. That being said, we can be a little more fast/loose with the handling. We can use Node’s built in event emitter to handle errors more generally that in callback form. I setup a few types of errors based on the Redis operations (getting, setting, connecting, etc). You can handle them as you need for your context.

I packaged all of this up into a module called HyalineHash. Hyaline is from the Greek for transparent and to simplify things I’m basically mapping an Object to a hash. There is no internal (re)typecasting, so if you pass in a number, you’re getting back a string.

A practical example

Imagine you ran a busy news site. Most news sites have a front page that follows a simple pattern — “above the fold” they have a handful of news stories for whatever is happening in the world. Visitors hit the homepage to check the top stories and often bounce out. 50–60% bounce rate is a pretty realistic figure — so if you have 1 million visitors to your site in a day only 400k–500k visitors move on to the next page. So, your home page is critical — it should be always up to date and it should move as fast-as-possible so it can serve the next visitor.

Normally, if you’re using Redis in this type of environment your information resides in a few keys and you would request the information from a number of different Redis keys, wait for the response and push the information back out to the user. This is a good technique, but it has some drawbacks:

  • Round-trip network latency
  • Multiple callback complexity
  • Burden on the Redis server.

In a high-traffic situation you are also more than likely to have multiple servers, so you need to have everything up-to-date across multiple processes serving pages. Back to our million-hit/day homepage. Say that you’re getting a rather beefy hash from Redis and it takes the server 6ms to grab that. Redis is, for all intents and purposes, single threaded. So, Redis can’t do anything else during that 6ms (again, not exactly right but close enough). Let’s look at the math:

1,000,000 hits x 6ms = 6,000,000ms
6,000,000ms === 1hr 40 minutes

So, just for the homepage you’re out for a decent part of a day just slinging the data for the homepage. Once you start dishing out the data for all the other pages, you don’t have much room left. Ouch.

Now, let’s assume that you make 48 changes to the home page a day (2 an hour, just an estimate). And, maybe behind some sort of load balancer or proxy you have 10 Node.js instances serving the web pages (the number of Node.js servers is irrelevant in the first example). Using this method of syncing with hyalineHash, you’re only requesting the full hash when it changes. How does this math work out?

48 updates x 10 servers x 6ms = 2,880ms
2,880ms === 2.88 seconds

Of course, you’ve got overhead for the pub/sub and keyspace notifications and more overhead on the Node.js side to coordinate the syncing. But it’s a drastic difference and your Redis traffic is not a function of website traffic, but rather of the amount of updating you’re doing (which will never match the traffic in this case). So, let’s make a callback-free, distributed homepage server in Node.

First things first — configure your Redis server (or go to RedisLabs to get a free 30mb Redis Cloud instance) to use keyspace notifications on at least hashes and non-type specific commands — you can do this through your redis.conf file or in redis-cli by running the command:

config set notify-keyspace-events AKE

(you can get away with less than all key activity [AKE], but in testing, I always do the whole shebang)

You’ll need two clients for hyalineHash —subClient and client. They have the same configuration (pulled from a JSON file specified in the command line arguments, this file contains a node_redis config object), but you need both as you can’t have normal Redis traffic and pub/sub on the same connection.

To generate the HTML, we’re using the templating language pug (formally known as Jade, a much better but evidently copyright infringing name). For this example the data is stored in a hash with six fields- headline1, headline2, headline3, summary1, summary2 and summary3. It’s not ideal nor flexible, but it gets the job done as a proof-of-concept.

To get started, go ahead and grab the github repo and run:

$ npm install
$ node index.js --redisconfig /path/to/your/config.json --port 3050

This will start an Express server up at port 3050.

With Express, we can render an object through a templating engine with res.render without anything else. The beauty of this system is that because the object exists as a variable that is synced automatically with Redis you don’t have to do anything else but pass it directly to the res.render function. Your route callback is a single-liner.

app.get('/',function(req,res){
res.render('news', topStories);
});

Let’s put some data in — with the example server running, go into redis-cli and enter:

> hset top-stories headline1 "Redis: A Fast Data Store"
> hset top-stories summary1 "In-memory is the next big thing..."

Load the page by visiting http://localhost:3050/ . You should see something like this:

Your first news story

Now that you’ve got the server up and running, let’s explore how we can add more news items. While it’s a neat trick that the above route didn’t need a callback to read from Redis, let’s also write to Redis without a callback. As with any web form-based system, we need a couple of routes. One route to draw the form (a HTTP GET) and another to save the data and display the form (a POST). The GET is a straight forward template + data technique:

app.get('/news-edit',function(req,res){
res.render('news-edit', topStories);
});

Very similar to the route. Now, let’s write the POST. With the POST, we need to interpret the POST data, save the information and then display the form again. There are several steps here — we’ll use Express middleware body-parser to handle the POST data and inject it into the req object. Next up, we need to save that data back to Redis. Unfortunately, you can’t just reassign the topStories variable to the req.body. Doing so would cause you to lose the special syncing feature. hyalineHash has a special method called replaceWith that first removes the old object then takes the enumerable properties of the passed object and applies it to the hyalineHash object. Now, this is where the eventual consistency can bite you — all the replaceWith work is done within Redis (DEL followed by a HMSET in a MULTI). If you immediately retrieve the topStories object, you’ll almost certainly get the old value. In this case, we can code around it by just passing the req.body object back to the renderer.

Sidenote: replaceWith is only needed when you want to sub in an entirely new object — if you want to replace single values, you can use the normal assignment syntax (myObj.foo = “abc”).

Visiting http://localhost:3050/news-edit will bring up your editing panel. This is, of course, only to illustrate the subject at hand and has no authentication nor validation. But, you can see how clicking the “Submit” button saves the data to Redis and, internally, we aren’t using any callbacks to store it, just a simple replaceWith. Going on and clicking the “View the news” button will take you back to the news page, updated.

An observant reader might remember that the port (-port 3050)was set in the arguments when starting index.js. What’s neat about this method is that you’re also effectively sharing this hash between two node processes — both will treat the object as if it were in the scope of the process. In two different terminal windows run the following commands:

Window 1:

node example.node.js --redisconfig ../dis-local-test.json --port 3050

Window 2:

node example.node.js --redisconfig ../dis-local-test.json --port 3051

So, now, running side-by-side you can see how Redis is syncing the object between processes without using callbacks (note the different ports in the URL).

The first headline needed a little more oomph so I added an exclamation point.

While using hyalineHash is not appropriate for all situations, it can provide a nifty way to cut code complexity and to reduce the bandwidth between a web server and Redis. HyalineHash also illustrates the power of Redis pub/sub and Javascript getters / setters allowing you to use both Redis and Javavscript in less-than-traditional ways.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.