# The Bkd Tree

## A Dynamic Disk Optimized BSP Tree

When searching for a solution to a recent problem I accidentally stumbled upon Bkd Trees. Haven’t heard of them? You’re probably not alone. A quick Google search for the term “Bkd Tree” will lead you to the original research paper and a patch to utilize them for certain spatial queries in Apache Lucene but not much else. In fact, at the time of writing, the second result Google gives me is the Wikipedia article for the similarly named K-D-B Tree. Bkd Trees are so unpopular that Google assumes you meant to type something else.

This is unfortunate because they are actually pretty great.

Bkd trees are a type of tree used for searching multidimensional data. This data can be anything from points in physical space to colors in a very large palette. Bkd trees are pretty good at their job too. Compared to other trees in it’s class, Bkd trees are often faster and more space efficient than K-D-B trees and simpler than R-Trees.

There isn’t as much as a diagram or Wikipedia article on Bkd trees. This article serves to correct that and explain in plain English how they work and some of their unique properties.

# BSP Trees

Bkd Trees belong to a family of trees called BSP Trees, short for Binary Space Partitioning Trees. BSP trees have a wide range of applications in computer science. Apart from standard map and set operations, BSP trees also generally make certain other operations more efficient. Examples are range queries (Give me all points within this area) and nearest neighbor queries (What point is the closest to the one I provide).

These trees are conceptually similar to binary search trees. Instead of searching through the entire space, we divide the space into chunks. BSP’s do this by slicing the space using a plane. As an example, imagine we have a 100 x 100 x 100 cube full of points and are searching for the specific point (10, 20, 30). One can divide the cube into two sections: One where x is greater than 50 and one where it is less. Because we are looking for a point where x is equal to 10, we can immediately throw away the second half of the cube. BSP Trees recursively subdivide the structure into regions. The following is a great visualization of this:

## k-d Trees

k-d Trees are one of the most common type of BSP Tree. k-d trees and the variants mentioned below allow keys to include an arbitrary number of dimensions. This is potentially advantageous to other BSP trees like Quadtrees or Octrees which are only usable for 2D and 3D space respectively. A k-d tree is implemented using a binary tree is a similar manner to a BST. The key difference is that the key comparison cycles through different dimensions at every level of the tree. The following is an example of this in action for a two dimensional tree.

Like a Binary Search Tree, if a k-d tree is balanced, we are guaranteed a time complexity of O(log n) since every node divides the set in half. The key caveat is that this is only guaranteed when the tree is balanced. Imagine adding the points (1,1) and then (0,0) to the tree. Both would go to the immediate left of the tree and could cause it to be unbalanced. Making things more complicated is that we can’t use standard balancing techniques like tree rotation with a k-d tree. The only thing that can really be done is to completely rebuild the subtree to be balanced, akin to what the Scapegoat Tree does.

While standard k-d Trees don’t function particularly well when being dynamically updated, they are excellent for cases in which the data is static. An example could be an asset in a game.

## K-D-B Trees

Next up is the K-D-B Tree. This tree is what you would get if you mixed the k-d Tree and a B+ Tree. Like the standard k-d Tree, an interior node divides the space up into several regions. Unlike the k-d Tree, interior nodes are not themselves points. Regions are defined by two points in space. The first contains the minimum positions in each dimension for the region while the second point defines the maximums. The following example shows a K-D-B Tree where an interior node is composed of 3 regions and leaves are composed of three points. Note these two sizes do not have to be the same. Generally interior nodes contain less regions than leaves contain points. The regions in each node are sorted along a single axis. Unlike the k-d tree this axis does not cycle between levels of the tree.

Because the tree is structured as a B-Tree, it will work well on a disk. This is because the increased fanout of each node makes the nodes larger and the trees more shallow. Disks generally have high latency, but high throughput. This means it is advantageous to read large chunks of data since reading small chunks will take a similar amount of time due to latency. The shallow depth also means less non-local reads will need to happen. Generally on disk B-Trees will size a node so that it is at least as large as a page (generally 4kB), often times larger. Because of this it is typical for a node to have hundreds of children.

Like other B-Tree variants, the tree is guaranteed to be balanced. This is guaranteed by the insertion procedure. If an inserted element is in the region of a leaf and the leaf is not full, it is added to the leaf. If the leaf is full, it is split. Instead of adding to the height of the tree like you might imagine, this just adds another region to the parent node. Things get slightly more complicated if a region node runs out of space.

Imagine needing to make a vertical cut in the lower left region. Because there are now 4 regions, the parent region must also be split. This means we are now splitting the entire left hand region of the space. This is one of the major disadvantages of the K-D-B trees. Splitting a region often requires splitting children of the region. This modifies large portions of the tree and is especially slow when needing to write to disk.

Another disadvantage is memory efficiency. Because there are no constraints on how full a node must be, large amounts of space may be wasted. This impacts not only size on disk but also performance since less pages may be kept in the page cache.

## hB Trees

It is worth noting another descendant of the K-D-B Tree exists. hB Trees or Holey Brick Trees attempt to solve the issues by not cutting through entire regions and instead creating non-rectangular regions with holes in them. They are beyond the scope of this article but interesting to read about and much better documented than Bkd Trees.

# Bkd Trees

Bkd Trees manage to solve the space and insert efficiency problems. Bkd Trees are made up of multiple modified k-d Trees and unique methods for insertion. The following is a diagram I created to show the basic structure of one of these trees.

Edit: There is a small error in this tree. In the rightmost leaf the point (110, 8) should be (110, 80)

The tree combines a Binary Tree and B+ Tree. More specifically, the interior nodes must form a complete binary tree while the leaves are stored identically to those in a K-D-B tree.

Several factors mitigate the use of the Binary Tree on disk. Because this is a complete binary tree, nodes do not need to store pointers to their children, instead multiplication may be used. Given a 1-indexed node at position i, the node's left child is always 2i and the right child it 2i + 1. Since the leaves also contain the points, nodes need not contain any value themselves. Smaller nodes mean more can be kept in cache. Adding to this is the large amount of points held in the leaves. This reduces the height of the tree.

An even bigger factor, and one of the most interesting parts about Bkd trees are that these internal trees are never modified. Bkd Trees use a clever scheme to add new points.

First there is a buffer of size M elements. In the research paper this was kept in memory. This buffer may be a simple array or something better for performance since it may also be queried. The paper did not specify the optimal size for the buffer but intuitively it should be at least as large as the node size of the modified k-d Tree.

Given the Bkd tree has N elements, there is a set of log2(N/M) modified k-d Trees. Each tree doubles the size of the last. Elements inserted into the Bkd tree go first into the buffer. Once the buffer is full, the first nonexistent tree is located. The elements of the buffer and all trees prior to the empty one are combined in order to form a new fully balanced tree. A fast method for doing this is described in the paper.

This initially seems very expensive but upon further assessment can be very efficient. First some of the results. As a quick note I do not believe the tested code fsyncs data to disk immediately after write operations, this was not explicitly stated however.

# Real Performance

For insertions, Bkd Trees are more than two orders of magnitude faster than K-D-B Trees! The average insertion time into a set of 120 million points was 50 microseconds. This is impressive by its own but is even more astounding knowing the hardware used for this.

We used a dedicated Dell PowerEdge 2400 workstation with one Pentium III/500MHz processor, running FreeBSD 4.3. A 36GB SCSI disk (IBM Ultrastar 36LZX) was used to store all necessary files: the input points, the data structure, and the temporary files. The machine had 128MB of memory, but we restricted the amount of memory that TPIE could use to 64MB.

Put simply, this thing is fast. It’s not magic though. When we look at insertion operations, most go directly into the buffer. This will be zippy, hitting RAM if not the CPU cache. The more interesting part is the tree construction and writing.

This step is very expensive. It is full of sequential disk operations which is optimal for a hard drive but cannot avoid the massive amount of data being moved around. In fact, given the example above, there must exist a tree with at least 60 million nodes. How expensive is this to create?

Based on the benchmarks provided, one of the inserts would take over 10 minutes due to this process. This seems like a deal breaker but it is important to remember a few things.

First: This is incredibly old hardware. Modern processors are much faster and SSDs completely change the game.

Second: This is incredibly rare. The more expensive the operation is the less often it happens. As shown above, the actual throughput is generally excellent.

Third: Data may be read from the tree while this insert is happening. The insert doesn’t actually alter any of the existing tree structures until it has finished constructing the new tree. Reads will most likely be slower due to I/O and cache contention but they necessarily are not blocked by this.

If I had to guess, the spiky nature of inserts is not going to be a problem on modern hardware unless you require the ability to do inserts in realtime. I plan on updating this article with my own benchmarks once I implement this structure to confirm this.

Querying the tree is easy, but slightly inefficient. A query must be performed on each modified k-d Tree as well as the buffer. This is slower than the equivalent in a K-D-B tree but not by an order of magnitude since the trees are each smaller. The following is for a very large range query (1% of the entire space or roughly a 400 million by 400 million window).

Simple queries should be dramatically faster than this but are not included in the research paper. If range query performance is your top concern the Bkd tree is probably not the optimal structure.

Finally, we have the issue of space efficiency. We should expect this to be near perfect and it is. This chart illustrates space efficiency with a uniform random and real life data set.

A concern is that the paper doesn’t specify a way to compact deleted points. As more points are deleted from leaves space efficiency will be reduced. One could certainly implement their own scheme for this however.

In conclusion, Bkd trees are a unique data structure with theoretically great amortized write performance and space efficiency. They certainly have many potential interesting applications. In the next article I will describe actual implementation of this data structure and give a modern day comparison of this.

Software Engineer in Training

## More from Nick Gerleman

Software Engineer in Training