Busying Oneself With B-Trees
If you spend enough time thinking about and learning computer science, you’ll start to notice that everything is linked. The history of computing can often feel like a linked list of events, where one event, person, or invention inevitably is linked to the next.
If we really think about it, this is the case with nearly all of human history. Human beings tend to build their own creations based on the work of those they admire. We draw inspiration and take ideas from the masters who came before us, often reviving or reinventing old ideas and reconfiguring them to become our own. And along the way, we add another another node the linked list of history, which ends up being another stepping stone for someone else later down the road.
As we learned last week, this was the case with the inventors of the red-black tree, who based their theories and research on the work of another computer scientist who came before them. The creators of the red-black tree adapted their work from the foundations of someone named Rudolf Bayer, rand stood on his shoulders in order to derive their own data structure.
So what was so inspirational about Bayer’s work? How did his work become the cornerstone for the data structures that we use today? It’s time for us to take a traverse one step back in the linked list of history and finally find out.
Dissecting 2–3 Trees
Throughout the course of this series, we’ve been adding to our knowledge base of the tree data structure. Most recently, we dove into self-balancing trees, including AVL trees and red-black trees. As it turns out, B-trees fall into this category of trees, too.
A B-tree is a self-balancing tree that is a variation on a binary search tree in that it allows for more than 2 child nodes. Similar to other self-balancing tree structures, B-trees also have a set of rules that they must follow; however, they are unique in that they are particularly adept at handling data that cannot fit into a machine or program’s main memory. In other words, they’re different from AVL and red-black trees because they can handle large amounts of data in ways that neither of these other self-balancing trees really can.
As it turns out — surprise, surprise! — B-trees have their own interesting history. Since we’re trying to understanding what B-trees are, and why and how they were created, it’s worth understanding where B-trees even came from!
B-trees are nothing more than a generalization of a 2–3 tree, which was invented in 1970 by a computer scientist named John Hopcroft. A 2–3 tree is a tree data structure where the nodes of every tree contain data in the form of keys, as well as potential child nodes. The interesting thing about the data of a node in a 2–3 tree is that it can contain more than one key, which is unlike any tree structures that we’ve covered in this series thus far.
However, there are some constraints that the nodes of a 2–3 tree must follow; for example, a single node cannot contain an unlimited number of keys.
The general rules of a 2–3 tree are as follows:
- The internal (parent) nodes of a 2–3 tree must contain either one key/piece of data, and a reference to two child nodes
- Alternatively, the internal (parent) nodes can contain two keys/pieces of data, and a reference to three child nodes
No matter what, this ratio must be followed for every single internal node in a 2–3 tree in order for it to be valid. And what about when it comes to leaf nodes? Well, since by definition a leaf node cannot have any children (because that’s what makes it a leaf), the leaves of a 2–3 tree can either contain one or two keys.
Since tree data structures are best explained with a visual example, let’s draw one out to see these rules in action.
In the example shown here, we have a valid 2–3 tree. We’ll notice that some nodes contain only one key, while other nodes contain two keys. Remember: a node is different from a key! Just like a normal binary search tree, a node contains data. In the case of a 2–3 tree, there may be two pieces of data — two keys.
Notice that the left child of
37, the root node, contains two keys:
29. Since a node that contains two keys must have three children, we can be sure that this left node will have three child nodes, which it does! Similarly, the right child of the root node is a node with only one key,
44. Thus, it is limited to have just two children.
A 2–3 tree has a variety of unique characteristics, all of which we will see again in a moment when we get back to the topic of B-trees.
In the illustration shown here, the keys of every node have been replaced with the values
1-12. Once we rewrite the values to be consecutive integers, a pattern starts to emerge:
the keys of a 2–3 tree are sorted as an in-order traversal that moves up and down the structure, from left child, to the root, to the right child.
We can also start to see another abstraction start to come into play, provided we look closely enough: if there are n keys in a node, then that node will always have n+1 children.
Similarly, we can notice another pattern, where all the children nodes in the left subtree of a node are smaller than the key that references them. For example, in the right subtree of this 2–3 tree, the keys
10 are smaller than the key
11, and are to the left (in the left subtree) of their parent node. However, the key
12 is bigger than it’s parent node
11, so it’s is to the right (in the right subtree) of it’s parent.
At it’s core, the 2–3 tree isn’t all that complicated. Even without actually going through the steps of searching, inserting, or deleting from a 2–3 tree, we can probably guess how that would go. Since we can quickly determine that that this structure is very similar to a binary search tree, we can draw on our knowledge of BSTs and recall that searching, inserting, and deleting from a 2–3 tree would end up taking logarithmic time, or O(log n). The only main difference here is that a node could contain three children rather than just two, and that internal nodes in this tree must fall into one of two categories when it comes to how many keys and children in can have.
However, the rules that define a 2–3 tree are crucial for understanding how a B-tree actually functions. Now that we understand Hopcroft’s work on the 2–3 tree, it’s time to apply some of the patterns and abstractions we just noticed!
To B-tree or not to B-tree
When looking at the patterns of a 2–3 tree, we noticed that the number of n keys in a node correlated to how many children (n+1) that node could reference. As it turns out, that abstraction is exactly what the inventors of the B-tree drew upon when they derived a B-tree from the 2–3 tree.
In 1971, two researchers named Rudolf Bayer and Edward M. McCreight were working on a paper at Boeing Research Labs. This paper, entitled “Organization and Maintenance of Large Ordered Indexes”, outlined a data structure that was based on a generalized version of a binary search tree that was optimized for reading from and writing to large portions of data at a time. The main thesis of the paper was that one could use the branching factor in order to determine how many possible children a node could have, as well as how many keys it could contain.
The branching factor represents the bound, or the maximum and minimum number of children per node in the tree. For our purposes, we’ll refer to the branching factor as the variable B, but remember that this is just a variable! The branching factor is also sometimes referred to as n, or another variable letter.
So, how can we use the branching factor to determine what a B-tree looks like and how it is structured? Why, by using a formula of course!
The number of children per node in a B-tree can be described as: B ≤ x < 2B, where B is the branching factor, and x represents the number of children per node. Notice that there is a strict upper bound for the number of children per node: we’re using <, not ≤.
Similarly, the number of keys per node in a B-tree can be described as: B-1 ≤ y < 2B-1, where B is the branching factor again, and y is the number of keys/the data that can be contained in a node.
If we think back to 2–3 trees, this makes sense: remember how the number of keys was represented by n and the number of children was abstracted to be n+1? We’re doing the reverse of that here: the number of possible keys is effectively the same as the number of possible children, minus 1.
In fact, let’s try this formula with a 2–3 tree and see if it works. In a 2–3 tree, the branching factor, B, is equal to 2. If
B = 2, then we know that the number of children nodes must be 2 ≤ x < 2(2). Because of the strict upper bound, this basically means that x can only ever be 2 or 3. By the same token, the number of keys must be 2-1 ≤ y < 2(2)-1, meaning that y must be greater than or equal to 1, but less than 3. Thus, using this formula, we can confirm that 2–3 tree can only have 1 or 2 keys per node.
In addition to these formulas, there are a few quick facts to keep in mind when working with B-trees:
- A B-tree (including 2–3 trees) is often described by referring to its bounds as parameters. A B-tree is described as a “(B) — (2B-1) tree”. For example, when
B = 2, we refer to that three as a “2 — (2(2)-1) tree”), which is where the naming for a “2–3 tree” comes from.
- The root node defies one of the two restrictions in that it has no lower bound when it comes to how many keys it must have. For example, this means that a 5–9 tree’s root could have many children, but might only contain only one key, which is totally acceptable.
- All of the leaves in a B-tree must be at the same depth, always. Similar to how the root node breaks the key restriction, leaf nodes break the children rule, since, by definition, a leaf node can never have any children.
Before we get into basic B-tree operations, let’s remind ourselves what a single node in a B-tree looks like.
It’s important to be comfortable with how a node looks because we’re going to be dealing with changing pointers and references quite a bit in the next section. Here are some key things to remember about the inner workings of a single node in a B-tree:
- A node contains at least one, and possibly many keys as the data that it holds.
- A node references one or many children, unless it is a leaf node.
We won’t walk through how to search through a B-tree, since it is very similar to performing a search operation in a BST. The one important thing to remember is that when we search through a B-tree, we will iterate and compare the keys of each node that we look at; it’s helpful that the keys are in sorted order, so we can compare one key at a time and determine whether we need to move left, right, or down the middle of the tree as we continue to search.
However, things can get slightly tricker with insertion and deletion. We’ll work our way into the more complicated stuff, but to start, let’s look at the simplest versions of both of these processes.
Inserting into a tree is basically the same as searching; we start from the root, and walk down the tree as necessary until we find the correct home for the key that we’re trying to insert. In the example below, we’re inserting the key
25 into a 2–3 tree. We start at the root node, and compare the value of the one key at the root,
37, to the value to be inserted,
25 is smaller than
37, we move down to the left subtree.
Next, we’ll look at the keys at the left child, the node that is the parent of the left subtree. This node has values
25 falls between these two values, we’ll move to the middle subtree, between these two keys. The node at the middle contains one key, so in this scenario, all we need to do is add another key into this node, since it hasn’t filled up to its full capacity; we add
25 into the node that contains the key
Sometimes, however, even seemingly easy insertion may involve an extra step. For example, if we needed to insert the key
6 into this same tree that we’ve been working with, we’d follow the same flow as before: start at the root node, and make our way down the correct subtree.
It turns out that the key
6 needs to be inserted between
8. However, as soon as insert this key, this node will overflow! So, what do we do? We can split the overflowing node to make room for the value needs to be inserted.
We’ll take the middle key from the overflowing node, which happens to be
6, and move it up to the parent node, in the correct place. If the parent node overflows, we can simply repeat the process of splitting again, all the way up until we reach the root node.
Deletion from a B-tree is easy to do if the node that we’re deleting from has another key. But many times, the node that we’re deleting from has only one key, in which case, we’re left in a problematic scenario — namely, that we have other nodes pointing to a node that has no data in it! The best technique for dealing with this situation is through a rotation.
In the example tree shown here, we have a section of a B-tree that we’ll be focusing on (in practice, this would have to be a subtree, and part of a much larger tree data structure in order to be valid).
Imagine that we want to remove the key
13 from this B-tree tree. We can do this pretty easily, except that as soon as we remove this key, we have an empty node with no data, which is a direct violation of our B-tree rules!
In order to fix this, we can rotate a key from the parent node down since it has one to give. But, then we need to make sure that we fill that empty data point in the parent node with some value; remember that a parent node must have 2 keys if it is to have 3 children, which means we need to add one more key to the parent node. In a rotation, we can take nearest in-order sibling element and replace the empty key in the parent node with that key. After deleting the key
13, we rotate down
9 to take its place, and move up the key of
7 to fill the empty spot in the parent node.
Not too terribly complex, right? Don’t worry, I’m about to complicate it a bit further.
Buzzing around wild B-trees
Okay, so we have a few handy tools at our disposal when it comes to dealing with trees: we can split the keys of a nodes or even rotate around the values inside of them. But sometimes neither of these two options is quite enough to do the job.
Let’s look at an example of where this would be the case.
In the B-tree below — which just so happens to be a 2–3 tree — we want to delete the key
40 from the structure. However, even from just looking at this illustration, a problem starts to become super evident: if we delete
40, we need to do something with its children nodes.
The easiest thing to do is to replace the key we deleted with one of the values from the children; we can choose the leftmost element of the right child node here, since this node has two keys,
50. This should be an easy choice because it has two keys, which means it has one key to spare, and can easily donate this key “up” to its parent. We can propagate up the key
48 from the child node into its parent, and use the child key to fill the data of the parent node so that no rules are being violated.
But what if we wanted to delete a key in a much more precarious spot? Let’s say that now we needed to delete the key
This is going to be a bit trickier, since deleting the key
48 will orphan its two child nodes,
50! Uh oh. Well, let’s do what we know will work to start: we’ll move the left in-order child up to take the place of its parent.
But, now that we’ve moved
39 up to be the parent node, the left child that used to be
39 is now empty.
A good way to handle this is by merging these two nodes together, and propagate the empty, referenced key up so that the “keyless”, empty node is now the parent.
Now we’re finally in a good position to push the “keyless” node up to the parent node that contains
36, and rotate the value of
36 down to the child node. Since we can’t have two nodes with one value and two children, we need to do something about
36. The easiest thing to do is, of course, to merge them together.
Effectively, what we’re doing in this entire process is combining the middle and the right subtrees together into one single right subtree.
All of this might seem like a whole lot of work at first glance, but consider this: even doing the work of searching, inserting, and deleting — no matter how many steps we have to do to rearrange pointers and move around keys — is still logarithmic, which means that all of this is still super efficient.
So, why would anyone ever use a B-tree rather than any other self-balancing tree? Because of the sorted order of keys and the ability to sequentially traverse through the structure, B-trees are super efficient for storing data in an indexed database, like MySQL, for example. It becomes very easy for a database to read a section of a disk at a time because most B-trees have a B value (branching factor) that is equivalent to the size of a block of data. Caches are often mapped to entire portions/sections of the B-tree at a time, which means that a database could pull an entire section out at once and read whatever value(s) it needs from it without having to make tons of additional calls again and again. Super clever, eh?!
After all of this, we’re still left wondering one thing: why the “B”? Well, as the is the case with the most interesting moments in history: it all depends on who you ask. There has never been an explicit explanation or answer to this question. Some people have speculated that the “B” stands for “balanced”, or “broad”, or perhaps even “Boeing”, where both McCreight and Bayer were working at the time of writing their research paper.
Perhaps the best answer any of us will ever get comes from the 24th Annual Symposium on Combinatorial Pattern Matching in 2013, when Edward M. McCreight explained the fuzzy history behind the “B” in B-trees:
Bayer and I were in a lunchtime where we get to think [of] a name. And … B is, you know … We were working for Boeing at the time, we couldn’t use the name without talking to lawyers. So, there is a B. [The B-tree] has to do with balance, another B. Bayer was the senior author, who [was] several years older than I am and had many more publications than I did. So there is another B. And so, at the lunch table we never did resolve whether there was one of those that made more sense than the rest. What really lives to say is: the more you think about what the B in B-trees means, the better you understand B-trees.”
At the end of the day, all we can be sure of is this — if the question is whether to B-tree or not to B-tree, the answer is: it depends, of course! The answer is almost always, it depends on what we want to do. But at least we know how to do it now!
B-trees are particularly easy to read and research since they are the foundations for databases, and are so widely-used. However, there are so many resources out there that it can be tricky to find the really helpful ones. Here are some of my favorites to help you continue learning about B-trees!