How I Sped Up Route Lookup 200x with Radix Trees
Once upon a time, while doing dishes, I thought why not use a radix tree for an http routing mechanism in a project I’m working with. First of all, this project doesn’t use any frameworks, and route lookup logic was as simple as iterating through each of them and checking whether current request matches it. Of course, there is no shortage of routers based on a radix tree (aka patricia tree, or prefix tree), but I liked the idea of implementing it myself, from scratch. I wrote it on php, but the concept applies to other languages. So, here is how I did it.
What is a Radix tree?
Wikipedia does a great job describing it already, but for the sake of this post to be autonomous, I’ll give a short informal description.
Suppose you have lots of string keys which point to some values. You need a data structure that allows to store and look them up. First option is a hash table: it’s the fastest way to look up the keys, but each key is a byte sequence of a fixed length. With a Radix tree, you can utilize the fact that some strings have a common prefix.
For instance, suppose you have three strings: vasya
, vasily
, and vasilevich
. They correspond to the values of 1
, 2
, and 3
. Those strings have a common prefix of vas
. After it’s extracted, the last two strings have an il
common prefix. Now imagine how much space you save when utilize this approach at scale, when you have billions of strings.
Internal nodes contain those common prefixes, and each prefix points either no the next internal node, or a leaf node with a value:
Radix tree is a great way to avoid O(n)
complexity — where n
is a number of routes. Instead, it becomes O(m)
, where m
is a length of a searched key.
Why use Radix tree for route lookup?
First of all, there is nothing better than a hash table lookup: nothing beats O(1)
. But routes can have placeholders: /transactions/:id/refund
, for instance, and this is the case I’m optimizing for in this post. If there are a couple of dozen million transactions, I need some persistent storage for storing all possible route values. Besides, I should write a logic for dynamically adding new routes as new transactions are created. But I don’t want to add an extra network hop for route lookup. So, I’m not taking this path.
I need a data structure which can be used locally and, hence, doesn’t take too much space. Besides, this structure should deal nicely with placeholders. And, finally, I want to avoid O(n)
complexity. So, I think Radix tree is worth the try.
Visual overview of a Radix tree
First, I consider where I wanted to land. I make it visual, and I give no implementation details — hopefully, it can serve as a mental model and helps to build up heuristics. Then, I’ll proceed with ways to get there.
Keys with no placeholders
I’ve already given an example for this case in a “What is a Radix tree?” section. So let’s skip this part and move on to the next scenario.
Keys with placeholders
What if I want to add a key with a placeholder, for example, vasily/:id/hey
with a value 1
? I can’t think of a nice semantic way to form a static prefix out of a dynamic one. So I introduce a new entity — “internal node with a placeholder prefix”:
I think of it this way: if I can’t represent a dynamic prefix statically in a nice way, I’m left with the single option — not to represent it at all. This placeholder is hidden inside an Internal-node-with-a-placeholder-prefix-thing.
Now, let’s add another key to a radix tree above: vasily/:id/heyday
, and its value is 2
. In an innermost internal node, their common prefix is /hey
. But what should I do with the first key? It ends with /hey
, so the evident option is to add an empty string into the new innermost Internal node. To be honest, I don’t think it’s such a bad idea, but at the time of writing code I thought it would be more semantic to introduce a new entity called “Internal node with leaf node”. Here is how the result could look like:
And finally, let’s consider the last and most interesting case.
Keys with and without placeholders which conflict with each other
Now, what if I want to build a radix tree with two keys: vasily/:id/hey
= 1
and vasily/hey/:id
= 2
? They contain a dynamic and a static prefix on the same positions, thus conflicting with each other. Why do I use a word “conflict”? Because placeholders and static keys can’t be represented at the same time in a single internal node. Hence, I need to introduce a new node — I called it a “Conflict-resolving node”. Basically, it contains two internal nodes: the first one with fixed, static prefixes, and the second one with placeholder prefix. Mind the leaf node corresponding to value 2
. Its key ends with a placeholder, thus I represent this fact with a “Leaf node with a terminating placeholder” entity:
For me, the only sensible way of key lookup in this case is to first check the static key, and if there is no match, move on to placeholder node. Why? I think of a fixed key as a special case of a placeholder. It can come in handy when I decide to override a generic key with some special key. For instance, if I lookup a key vasily/hey/hey
in an aforementioned tree, I will always end up with the vasily/hey/:id
key.
Visual overview of Radix tree building
Let’s now consider the process of key adding. Here are the examples; they represent the same cases I outlined in the previous section, and contain some extra ones.
Keys with no placeholders
First example
Let’s add the first key — vasily
. It’s quite straightforward. The resulting tree looks like that:
Now, let’s add a key vasya
. Is there some prefix of this key which is already present in a tree? Yes, it’s vas
:
Now, I need to turn a leaf node with key vasily
into an internal node with a key vas
:
Finally, I can add a key ya
in an innermost internal node — the one corresponding to vas
prefix:
We’re done. This is probably the simplest example, but it demonstrates important concepts which allow me to come up with a draft of a general higher-level solution for key adding. It goes like that, referencing the previous example:
- Find the deepest internal node where to add a key. It’s the single internal node with the single key,
vasily
. - Then, find the left-hand common part. It’s
vas
. This leaves me with the keyya
that I should add. But I don’t have a internal node where I can add it. - That means that I should split an existing key, using the common prefix,
vas
, as a key, so that I end up with an internal node. - Finally, I can add a key
ya
in a newly created internal node.
Second example
Now, let’s add the third key, vassal
, in this tree. The common prefix is vas
:
This key, vas
, is already present in a tree. Let’s dive in further then, and cut the prefix off a key:
Great, empty common prefix is a condition indicating that it’s time to add a key in an internal node. An entire tree now looks like that:
At this point I can refine a general algorithm a bit:
- Find the deepest internal node where I should a key, the suitable key in this node, and a common prefix. Do so recursively, diving deeper into a tree on each run.
- If a common part is empty — great, add a key into an internal node.
- If not, I should modify a tree’s structure prior to adding a key. Those modifications can take several forms. You’ve already seen the first one, where an existing key is split. I’ll describe some others in the examples below.
- When it’s done, add a key.
Let’s consider some more examples.
Keys with placeholders
First example
Suppose we have a tree with a single key: vasily/:id/hey
. It can be represented visually like this:
Now, let’s add a key vasily/:id/heyday
. Following an algorithm I’ve outlined above, I end up with the following deepest internal node, along with the single suitable key:
Why is it suitable? Because it has a left-hand common part with a key to add:
Thus, I need to modify an existing structure so that I can add a key day
later:
I’ve replaced a leaf node with an “internal node with a leaf node” entity. Internal-node-part is empty, while the previous leaf node migrated here. Now, everything is ready for adding the last suffix of an added key:
Second example
Now what if it’s the other way round: there is a vasily/:id/heyday
key in a tree, and I want to add vasily/:id/hey
? First, I’m going to split heyday
into hey
and day
:
Then I’m ready to add a new key. But I’m adding it as a leaf of an internal-node-with-a-leaf entity which replaces an existing internal node:
This results into the same tree I arrived at in the previous example.
Keys with and without placeholders which conflict with each other
That’s the last case, and probably the most interesting one. Let’s consider we have a tree with a single key vasily/:id/hey
. It looks like that:
Now I want to add a key vasily/hey/:id
. Apparently, there is a conflict: prefix vasily/
is followed by an :id
placeholder in the first key, and a static string hey
in the second key. Hence, I run a structure-modifying step: wrap an existing node-with-placeholder-prefix with a conflict-resolving node, and pass an empty internal node where I will add a new key a few moments later:
And finally I add a new key in a previously created internal node. Now, the whole tree looks like the following:
You can find code implementing these concepts and building a radix tree here.
How to output a Radix tree as code
Now, what I need is effectively a memory dump of a tree into plain-text code. Besides, I want nice formatting reflecting my aesthetic preferences. This implementation boils down to traversing a tree: checking each node’s class and outputting a current node based on it. Its argument might be another node, so this whole procedure is recursive in nature. For example, our radix tree contains the following eight keys: /vasily/:lebov
= 0
, /vasilev/:anton/vasily/:belov
= 1
, /vasilev/:anton/anton/:petrovich/anton/:vasilyev
= 2
, /vasilev/baton/:vakhlakov/vakhlakov
= 3
, /vasilevs/:antons/tabakovs/:matskyavichus
= 4
, /vasile/:anton
= 5
, /vasily/belov
= 6
, /vasilev/baton/vakhlakov/:vakhlakov
= 7
. Its output looks like that:
new DefaultInternalNode(
[6],
[
'/vasil' =>
new DefaultInternalNode(
[1, 2],
[
'y/' =>
new DefaultConflictResolvingNode(
new DefaultInternalNode(
[5],
[
'belov' => new DefaultLeafNode(6),
]
),
new LeafNodeWithTerminatingPlaceholder(0)
),
'e' =>
new DefaultInternalNode(
[1],
[
'v' =>
new DefaultInternalNode(
[1, 2],
[
'/' =>
new DefaultConflictResolvingNode(
new DefaultInternalNode(
[6],
[
'baton/' =>
new DefaultConflictResolvingNode(
new DefaultInternalNode(
[10],
[
'vakhlakov/' => new LeafNodeWithTerminatingPlaceholder(7),
]
),
new InternalNodeWithPlaceholderPrefix(
[10],
[
'/vakhlakov' => new DefaultLeafNode(3),
]
)
),
]
),
new InternalNodeWithPlaceholderPrefix(
[1],
[
'/' =>
new DefaultInternalNode(
[6, 7],
[
'vasily/' => new LeafNodeWithTerminatingPlaceholder(1),
'anton/' =>
new InternalNodeWithPlaceholderPrefix(
[7],
[
'/anton/' => new LeafNodeWithTerminatingPlaceholder(2),
]
),
]
),
]
)
),
's/' =>
new InternalNodeWithPlaceholderPrefix(
[10],
[
'/tabakovs/' => new LeafNodeWithTerminatingPlaceholder(4),
]
),
]
),
'/' => new LeafNodeWithTerminatingPlaceholder(5),
]
),
]
),
]
);
The first argument of each internal node is an array with key lengths, so that a key lookup logic always knows exactly how many symbols to cut off from an input string.
This radix tree has sped up route lookup tenfold, comparing to linear search. To be honest, I was expecting even more significant boost. So, since I’m elbow-deep in this enterprise already, why not combine this approach with the one Nikita Popov suggested back in 2014 — to collect all routes in a single regular expression?
How to output a Radix tree as a regular expression
What I need is just another class for representing a tree, this time in a form of a regular expression. Conceptually, it’s not that different from a memory-dump version. The most interesting part here is a way to map a found pattern to an original key. For example, I have the following regular expression: ^(a|b)$
. There are two strings that match it: “a” and “b”. But, how do know which one matches in each case? Here is what the MARK verb does: it tags a path that a regexp engine has taken. Now, you can write the previous regular expression this way: ^(a(*:1)|b(*:2))$
. Given you have a way to identify each match by a tag, you’re done. For instance, if you know in advance that tag 1 corresponds to a string a
, and tag 2 corresponds to a string b
, all you’re left to figure out is a way to get that tag. To do that, you can access a MARK
key of an array you pass in the third argument of a preg_match
procedure.
The route set from the previous section corresponds to this regular expression — in a readable form with tabs:
(?|/vasil
(?|
y/
(?|
(?|
belov(*:6))|([^/]+)(*:0))|
e
(?|
v
(?|
/
(?|
(?|
baton/
(?|
(?|
vakhlakov/([^/]+)(*:7))|
([^/]+)(?|/vakhlakov(*:3))))|
([^/]+)(?|/
(?|
vasily/([^/]+)(*:1)|
anton/
([^/]+)(?|/anton/([^/]+)(*:2)))))|
s/
([^/]+)(?|/tabakovs/([^/]+)(*:4)))|
/([^/]+)(*:5))))
Or, in a more compact form:
#(?|/vasil(?|y/(?|(?|belov(*:6))|([^/]+)(*:0))|e(?|v(?|/(?|(?|baton/(?|(?|vakhlakov/([^/]+)(*:7))|([^/]+)(?|/vakhlakov(*:3))))|([^/]+)(?|/(?|vasily/([^/]+)(*:1)|anton/([^/]+)(?|/anton/([^/]+)(*:2)))))|s/([^/]+)(?|/tabakovs/([^/]+)(*:4)))|/([^/]+)(*:5))))#
That’s more concise than a radix tree memory dump. Here is a regular expression for 69 highly-conflicting routes — pretty neat.
This optimization gave another whopping 20x boost to route lookup speed. Who knew regular expressions are that fast?! Crazy.
In Closing
It was fun for real implementing this logic. Being mostly a business applications developer, I don’t do such things very often. Route lookup speed increased two orders of magnitude, which saves us some CPUs. For a typical I/O-bound application with decent load, it’s not life-changing but noticeable still.
And, as it turned out, Symfony already has it this way! Kudos to Nicolas Grekas. I’ve never been a big fan of frameworks in general, but I always considered that people involved in Symfony ecosystem are brilliant.