The Road to Redis: Chapter 3

Searching through your data

Kyle
9 min readJun 21, 2017

This is the third part in the Road to Redis sub-series of the Node / Redis Series. You should read the previous part before starting this one, it’s about relationships in your data.

If you’re building a web app, then sooner or later, you’re going to want to do some sort of search of your stored data. I’ve found project managers kind of casually throw this feature in: “Of course, we want our data searchable.” Which is a valid request, but generally search is hard. If it were easy, there would be a new Google every few weeks :)

Let’s back up a little bit and consider what happens when you search data. If you imagine all your data in a text editor, you hit CTRL-F and you have a very rudimentary search. The search term “boat” will find “boats”, but “boats” will not find “boat” and it’s going to be pretty slow. If you don’t believe me, open up a huge text file (say from Project Gutenberg) and see how long it takes to search through the data. Even on cutting edge hardware, it’s going to take a relatively long time. This data is non-indexed — the software has to evaluate each character of the huge string vs the search subject. I’m sure better text editors do some optimizations in this area, but searching huge volumes of data (in this case, text) is something that can be surprisingly taxing.

Re-thinking language to get a better search

Note: Stay with me. I believe that one should understand how things work before using them. Consequently, I’m going to dive fairly deep on how a particular Node.js module works with Redis. Then I’m going to show you the fairly simple API of the module.

So, the heart of a full-text search is indexing. We touched on it in part 2, but let’s examine the mechanisms needed to search. Take the following sentence (in the index, we’ll refer to it as “bobs-day”):

Bob runs fast, Bob is a runner. Run bob!

First thing you’re going to want to do is break it down into parts. This is pretty easy — we’ll split it into an array:

['Bob','runs','fast','Bob','is','a','runner','run','bob']

Notice that I also removed the punctuation. That’s usually not relevant in an index. Next up, you’re going to want to somehow dispose of the letter case (either go upper or lower).

['bob','runs','fast','bob','is','a','runner','run','bob']

Now, ‘is’ and ‘a’ aren’t really useful outside the syntax of a sentence and would just pollute our index. We call these stop words. Filtering those out yields:

['bob','runs','fast','bob','runner','run','bob']

Our sample sentence has both ‘run’ and ‘runs.’ These two words are said to have the same “stem.” Likely if you search for “run” you’ll want both ‘runs’ and ‘run’ to be returned.

['bob','run','fast','bob','runner','run','bob']

We need to find the frequency of each word as it occurs in our sample. We’ll count them up and assign them to an object.

{
bob : 3,
run : 2
fast : 1,
runner : 1
}

People aren’t perfect and sometimes they misspell words. Usually these misspellings don’t quite match the dictionary, but if you were to sound out the letters, they’d make roughly the same sound. So, let’s also take unique values and convert them into a phonetic (metaphone) representation.

{
bob : 'BB',
run : 'RN',
fast : 'FST',
runner : 'RNR'
}

We’ve got quite a bit of data already — the words, their counts, and the phonetic representation. How then do we represent them in Redis?

Zsets, my friend, zsets are all you need. Each word/metaphone would be structured using the number count of each word and the name of the item being indexed (bobs-day). Working with the word “run we’d structure it like this:

zadd word:RN 2 bobs-day

The first argument is just the key colon concatenated with the metaphone. The second is the count of occurrences and the final argument is the item.

We also need to reverse it a bit and add the data to the other zset which is actually keyed by the index.

zadd object:bobs-day 2 RN

This might seem like storing the same information in two places, but each has distinct purposes that become more clear when we consider more than one item in the index. Let’s say that we add a new item (“sues-day”) that looks something like this:

Sue runs too. Look at Sue run.

This item would need these commands:

...
zadd word:RN 2 sues-day
...
zadd object:sues-day 2 RN
...

What if, in the future, you wanted to remove an item from the index? That would be a nightmare to go back and find all the word:* entries and remove them. This is where the object:* zsets come into play — with these you can remove them by finding the members of the object and referencing them to the correct zset and deleting the corresponding data.

We’ve now established how everything is stored in Redis, how do we turn this into a search feature?

We can (thankfully) reuse many of the same structures. Essentially, we need to turn the search string into metaphones and count them just like we would if we were indexing. If our search terms were:

Bob run

We would use the same metaphone routine and prefixing them with ‘word:’ to create an array like this:

['word:BB','word:RN']

Then we would run a few Redis commands:


ZINTERSTORE temp word:BB word:RN 2
ZREVRANGE temp 0 -1

This might seem a bit complex, but understandable if you break it down:

  • ZINTERSTORE is an intersection of two zsets (arguments 2 and 3) with the results stored at ‘temp’. The last argument is just the number of keys.
  • ZREVRANGE takes the first argument (‘temp’) and orders the zset results from the highest to the lowest score. The last two arguments are requesting the first element (o) to the last element (-1).

Then you would dispose of your temp key. All this is wrapped in a MULTI/EXEC block. When we execute it, the only result we’re really interested in is the 2nd. You should get back your item names as you indexed them, with the most relevant by keyword first.

Now with less pain

Now that you’ve seen the inner workings of how you can implement a search, let’s move on to how you probably will actually do it — with a Node.js module. The module in question is Reds (Redis search). Reds has been around since 2011 and is widely used. It was originally written by TJ Holowaychuk. A few years ago TJ left Javascript to write in Go and left his open sourced projects entrusted with a variety of people. I happened to pickup the torch for the Reds module. Let’s take a look at how the above process was is expressed using Reds:

...var search = reds.createSearch('bob-and-sue');search
.index('Bob runs fast, Bob is a runner. Run bob!','bobs-day')
.index('Sue runs too. Look at Sue run.','sues-day', function(err) {
if (err) { throw err; }
console.log('indexed!');
});

The argument on reds.createSearch is just a prefixing for the keys — in our example, all keys would start with ‘bob-and-sue.’ Searching through your index looks like this:

search
.query('Bob run')
.end(function(err,searchResults) {
if (err) { throw err; }
console.log('results');
console.log(searchResults);
});

A little more manageable, eh?

Let’s integrate this into our e-commerce web app. First, you’re going to need to put it into the script that adds our items. To setup the search, we’ll do this:

reds.setClient(client);
search = reds.createSearch(rk(keyRoot,'search'));

reds.setClient tells the Node.js module to use the already established client object. Then we’re going to create our search, prefixing all the search-related keys with ‘redishop:search’.

When adding the individual items, we’ll do the adding procedure from the previous installment then we’ll run the indexing function, since reds has it’s own MULTI/EXEC block.

search.index([
anItem.description,
anItem.name
].join(' '),
anItem.slug,
function(err) {
...
}
);

Pretty straight forward. We’re indexing the item descriptions and the name (joined by a space to make sure words don’t “runtogether”) to the slug. That’s all we really need to do!

An aside on indexing: This is a powerful opportunity. You can do all sorts of manipulation to how your items are indexed. In our example, we’re keeping it simple with just the description and name. You could, however, do some really sophisticated faked “intelligence.” For example, in our data set we have “mugs” and “shirts.” Let’s say you index as-is but realize that your users are searching for “coffee cup” and “tee” and getting no results. You could just re-index your items with some tiny logic to add in these synonyms where relevant.

On our server, we’re going to add an additional route to serve up the search. The route is /api/search/items?search=[url encoded search string]. This, too, is pretty straight forward. Reds uses a fluent syntax which means you can chain functions together so you’d run search.query([search string]).end([callback]). It will end up looking something like this:

search
.query(req.query.search)
.end(function(err,searchResults) {
if (err) { throw err; }
res.send({ items : searchResults });
});

We’re containing the searchResults in a object at the property items so we reuse our logic. Since everything is indexed by slug, the search string will just return those. The client side will then request the items in bulk (with the POST /api/companies route). Redis can really quickly return each item detail since it’s key referenced data. Double gain — it’s playing towards Redis’ strengths and it keeps your code super reusable.

To make the demo code exhibit the search functionality, there has been some more modifications done in Angular:

  • Popup modal so you can see the item descriptions.
  • Search box at the top of the screen that accepts search strings
  • Active search message between the search box and the items.

It’s really fun to see how the search box works. See if you can find some metaphone collisions — two words spelled differently but yielding the same results.

There is more to the reds module too — by default, the search is an “and” search — each term in the search is required to be present in the item. You can, however, change this to an “or” search. Take our previous example — switching the type is as easy as adding in an additional fluent function:

search
.query(req.query.search)
.type('or')
.end(function(err,searchResults) {
if (err) { throw err; }
res.send({ items : searchResults });
});

Internally, this swaps out the ZINTERSTORE for a ZUNIONSTORE. Personally, I think most “and” searches are better, but you might have a different use-case.

Additionally, you can actually wrench around in how reds processes the search strings. This is a feature that was recently added and it’s pretty important if you don’t want to index English, or, for sure, western content. When creating the search, you can pass a second options argument and set the nlpProcess property to a function. The nlpProcess function takes in the string and item key and returns the object completely processed into individual words, counts, word/metaphone mapping and the raw metaphone keys.

Finally, don’t be afraid of playing around with the sorted sets generated by reds. There are a lot of things that aren’t covered directly in the module, but examining the [root]:word:[metaphone] zsets can yield some creative off-book possibilities.

Another path

As mentioned above Reds has been around for literally years. It has some real strengths but it isn’t without complications. At the time when it was built, Redis had an intentionally limited set of commands. These commands were (and still are) amazingly flexible . However, at Redisconf 2016, the Redis modules system was introduced. The modules system extends the command set and allows for a whole new world of data manipulation. One of the earliest modules to be introduced was the RediSearch module.

RediSearch employees an entirely unique data structure and doesn’t rely on zsets which has some distinct advantages in speed and size over Reds. RediSearch has quite a few of the same components — it stems and has stop words just like Reds and even has some capabilities to do fuzzy searches using the Levenshtein distance (but it doesn’t do metaphone searches).

The API for RediSearch isn’t quite the same as Reds, but it looks to be pretty close (if not RediSearch being more full featured). Currently, at time of writing the module system is only available in Redis 4+ (not quite in production stage yet) and RediSearch is in pre-release. So, if you’re starting a greenfield development project, you should seriously consider RediSearch. Either way, you’ve got some great options. Stay tuned for a future entry on RediSearch itself.

--

--

Kyle

Developer of things. Node.js + all the frontend jazz. Also, not from Stockholm, don’t do UX. Long story.