Geek Culture
Published in

Geek Culture

Searching Redis

When GET by key just won’t do

Photo by Warren Wong on Unsplash

In my last article, I described some adventures in scaling a SCAN heavy redis workload. I ended intending to explore Redis Enterprise. I hoped to outsource the problem to the team behind redis.

It turns out SCAN is irredeemable and Obi-Wan Kebobi cannot save us from it. But, Redis Enterprise supports the RediSearch module, which enables an elegant solution.

Problem and Naive Approach

Let’s back up. The workload in question comes from a car wash analytics platform. Here are example keys stored in redis. These are composite keys formed from a prefix, a location id, and a license plate. The value stored at each key is a session id.

“lp-sess:loc:3d5a6232–43b4–4cd8-a423–5f322d05cd85:lp:JT4344”
“lp-sess:loc:84bba9e7–0a8a-4af5–8b23-f1a16844c7bb:lp:HXS290”
“lp-sess:loc:7eae9995–23dd-4f0d-be0c-175979bfdc5b:lp:GMP1936”
“lp-sess:loc:ea9ef92b-1814–4a24-b4b9-ec94a496f301:lp:944101”
“lp-sess:loc:741100c2–4a1c-45b5-a44f-c315b63aa30b:lp:IKR069”
“lp-sess:loc:a1981989-aabf-4e52–9925–78e3b93d5d2e:lp:AJT138”
“lp-sess:loc:f31077cc-0bb5–4157–939c-adc08941e7ed:lp:KDC3600”
“lp-sess:loc:72cf7004-abef-47c5–9d8e-22d32e22f73b:lp:9J1252”
“lp-sess:loc:d76f798e-4653–4240-bb11-ec1981f1f228:lp:P556P”
“lp-sess:loc:a36c0c25–4c68–46bc-962c-47f55c2198c6:lp:M275079”

When a car comes into view at a given camera, we would like to know if we have seen that car before. This information is in redis. Whenever a before unseen car comes into view, we add it to the database. The complicating factor is that license plate matching should be fuzzy. We tolerate some character discrepancy between plates.

Due to this fuzzy rule, an O(1) redis GET became an O(n) redis SCAN.

Here is the offending code.

for key in self._redis.scan_iter(match=key_mask):
_, lp_candidate = key.decode("utf-8").rsplit(":", 1)
sim = Similarity.lp_similarity_ratio(plate, lp_candidate)
if Similarity.is_similar(sim, self.__lp_sim_threshold) and (best_match is None or best_match[1] < sim):
best_match = LicensePlate(lp_candidate), sim

The key_mask above will filter SCAN results to a given location. We iterate over plates at that location. For each, we apply a similarity function against our candidate to identify best_match.

I had a rosy, faulty view of how SCAN works in a multi-node redis cluster. I constructed a shard key that would co-locate all keys for a given location. I assumed that with careful sharding and appropriate match filter, I could limit a SCAN operation to a single node. I was wrong. A SCAN operation iterates all keys in the database. In a multi-node cluster, SCAN is a distributed operation. Multiply that operation by a high throughput system, and you have a recipe for a performance bottleneck.

Alternative #1: Sets

Redis offers several useful data structures. For example, the set. We can create sets with location as key and add plates as members. Voila: a DIY secondary index.

redis set operations

A problem with this approach is that the TTL applies to the whole set, not the individual members. We would like to expire license plates after two hours. We don’t want to expire an entire location, just a single plate at a location.

It may be possible to work around this limitation by managing timeouts in code and removing members explicitly. This is not so elegant, though.

Alternative #2: RediSearch

Redis Enterprise is a managed redis solution from redislabs. It offers redis cluster as a service and several add-in modules. One of these add-ins is RediSearch. RediSearch allows you to set up a search index in your redis database. It facilitates complex queries.

You can try out RediSearch locally by running it as a docker container.

docker run -it --rm --name redis-search \
-p 6379:6379 \
redislabs/redisearch:2.0.0

You can connect to your local redis-search using the redis cli, or a GUI client like redisinsight.

Let’s start by creating a couple of hashes with loc, lp, and sess fields.

Use hset to create hash key and fields
Browsing hash values in redisinsight

Next, let’s create a search index. The index includes keys with prefix lp-sess and we index on the loc and lp fields.

Create a search index in RediSearch

Let’s query the index for a location.

A location-based search query

This is already much better. We can use the search index to lookup all plates at a given location.

Further, we can add a plate filter to our query. And we can make it a fuzzy match with the % operator, which applies a Levenshtein distance constraint. A Levenshtein distance of one corresponds to a mismatch of one character. The filter tolerance corresponds to the number of % symbols.

A composite search query with fuzzy plate matching

Hello, RediSearch. Here’s an example repo with a Python-based implementation.

Here’s the impact on search performance for the car wash analytics platform.

99th percentile fuzzy license plate search [s]

At the 99th percentile, plate search time decreased from 10s with SCAN to 40ms with RediSearch.

This journey was enabled by some very wise counsel from the folks at redislabs. Thank you, Obi-Wan Kenobi!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store