Understanding Data Structures: Binary Search Trees

Rylan Bauermeister
The Startup
12 min readJul 23, 2019

--

What This Guide Is

After last week’s post on linked lists, a tree is the next logical data structure to examine. Consisting of a series of nodes connected by pointers, trees are a top-down data structure with an N–1 child to parent ratio.

This guide will examine what a tree is and expand the idea to a Binary Search Tree. The code herein will be in Javascript, but will be written in such a way that it should be useable for programmers in any language.

What Is a Tree?

As previously mentioned, a tree is a node-and-pointer data structure. A basic tree can be thought of as looking something like this:

Fig 1: a binary tree

What we see here is called a binary tree, a tree consists of a series of nodes, each of which can have up to two children. The binary tree in Fig 1 is something of an overcomplicated linked list. It does all the same things, it just takes a great deal more effort. At this point, while pretty, it’s hard to be convinced of the usefulness of this structure.

A Case for the Binary Tree

So, why care? Well, by having each node have access to two children, we can set up paths through the tree based on true/false conditions. The most common case of this is the binary search tree, the data structure we’ll be examining in this article.

Here’s an image of one:

Fig 2: a binary search tree

Looks very similar, but now you can notice the following pattern: every node to the right of its parent contains a value larger than the parent’s value. Every node to the left holds smaller values. The same is true for all sub trees. This left/right spread could be done off other factors as well — any true/false equation comparing to the parent can be used to generate a binary search tree.

What does this gain us? Well, now when we search that tree, we can do so by using our knowledge of how it is laid out. In the tree above, if we wanted to find if six was in the tree we would start at the root. Six is greater than four, less than seven, and more than five. By following that path we do manage to find six, and we do so by checking only four of the eight values. This is why lookup in a binary search tree has an average timing of O(logn).

Fig 4: visualization of binary search

Operation Timing

Fig 4: Operational Timing of a BST

As was previously noted, BSTs have an average timing of O(logn) for most operations. However, it’s worth noting that they also usually have a worst case scenario of O(n). If you’re wondering how this could happen, consider this perfectly valid BST:

Fig 5: a worst case BST

As you can see, we started with a small root and did nothing but push incrementally larger values into this tree. As a result, the tree is more of a line, and in order to find the largest value in the tree we would need to examine every single node. We’d also need to pass over every node to insert a new value, or remove the largest value.

Under What Circumstances Is a Binary Search Tree Useful?

Binary search trees are useful when you need both storage and hierarchy. They are great for implementing priority queues, where you will often need to insert elements into the “middle” of the queue, but also need easy access to the ends. This way we have quick access to both the elements and their position in the hierarchy relative to the other elements in the set.

Similar to linked lists, binary search trees are also pointer based data structures, meaning they don’t have an upper bound on storage (aside from the physical limits of hardware). They can be grown or shrunk indefinitely, as they simply allocate memory for a new node when they need to grow.

Finally, they also form a useful data structure for representing anything that has a hierarchy. Think of folders on a computer — they naturally have parents and children. While this is not an example of a binary search tree (it is an example of a basic tree), it’s quick to see how a tree could be useful for organizing data so that users could access that data in a way that is intuitive. Like many data structures, trees can model real world objects in a way that makes people better able to work with their data. This is an attribute that should never be discounted.

Tree Terminology

There’s a few terms anyone working with a tree should be familiar with. They are listed here, along with their definitions.

Root

The top node of the tree, with no parent.

Leaf

A leaf (or leaf node) is a node with no children. As with a real tree, leaves serve as the extremities of the tree.

Row

A set of children. If we were looking at the tree as a family tree, we could consider each row to represent a generation. The parents are one row, the children the next row, the grandchildren the next.

Fig 6: the anatomy of a tree

Binary Tree Types

Full

A tree is considered “full” if every node either has both a left and right value, or is a leaf.

Complete

A tree is “complete” if every row of is full, save for possibly the last.

Balanced

A tree is balanced if the depth of two subtrees never differ by more than 1. Every balanced binary tree is complete. The reverse is not true.

Degenerate

The opposite of a full tree, a degenerate tree is where every node has only one child, save for the leaf.

Fig 7: different types of binary tree

Common Operations

There are a number of common operations that will need to run on a tree.

Searching (Breadth First Search)

Let’s backtrack for a moment, to our tree that had no organizational structure. Say you need to search a tree like this, or at least get every node in row order. How would we go about this?

Simply put, we use a queue (or first-in-first-out array). We start by adding the root node of the tree. Then we shift off the front of the queue, and add its children to the back. By doing this until the queue is empty, we add each row in order.

Here’s the code:

Fig 8: BFS in a BST

Searching (Binary Search)

If, on the other hand, we wanted to search a binary search tree, we can do so by tracing down the tree. If our input value is lower than the start, we go left; if its larger we go right. We repeat this until we find the value, or we end up directed to a null node.

Code for you:

Fig 9: binary search in a BST

Get Min/Max Values (Right/Leftmost)

We’ll commonly need to be able to find the min/max value in a binary search tree (or subtree) based on a root. We can do this by going as far to the left or right as possible. When the node we are searching has no child in the direction we are searching, we’ve found our terminal node.

NOTE: the node we end at may not be a leaf. It is entirely possible for a min or max node to have a child going in the other direction than the one we were searching. We can be guaranteed that none of them will be greater/lesser than our node because of how binary search trees are inherently organized.

The code is very simple:

Fig 10: get min/max value in BST

Implementing a Binary Search Tree

The Node Class

As with all node-based data structures, we start with a Node class. In this case, our node will need to have access to four things:

  1. A val. It will use this to compare to other nodes.
  2. A right node. Can point to another node with a greater val or null.
  3. A left node. Same as right, only contains a node with a lesser val.
  4. A parent node. While not strictly necessary, this will save a great deal of headache when we get to removing nodes later on.

So, without further ado:

Fig 11: a tree node

This class will serve the part of our tree node. As has been noted in my other guides, you don’t technically need to do this in Javascript.This example exists to help users of other languages who don’t use a language that allows them to simply graft properties onto anything they wish. If you want a more pure version of this in Javascript, you can omit the node class entirely and just use objects.

The Tree Class

Next we need the skeleton of our BST class. This, too, is pretty simple.

Fig 12: the skeleton of a BST class.

This skeleton is far simpler than that of a doubly linked list. Trees only track one thing: their root. All the rest of the logic falls on the node properties to handle.

Let’s get into the two big operations: insert and remove.

Insert

Inserting into a binary tree is relatively straightforward, but a sizable chunk of code. Check it out:

Fig 13: insert into BST

Let’s break this down. Lines 4–7 are an error check. If the tree is empty, we needed to have a special case to simply initialize a new node as the root; those lines take care of that.

From there, we move on to searching the tree. We use a binary search like the one laid out in Figure 9; you can see that we check if the value is lesser (or equal), and either go left or right. When we try to move, we check if the value we’re moving into is null. If yes, we put out new node as the value where that null was.

NOTE: we’re using less than or equal to for moving left, but that’s a personal preference. Doing greater or equal for moving right is equally valid — what is important is that you are consistent. If you do this backwards in even one place, it can break the whole structure in nasty ways. It’s extremely important to be careful with design choices like this!

Removing

Removing from a BST can be overwhelming. To make it simpler, let’s break it down into smaller pieces.

  1. Locate the node to delete
Fig 14: find node to delete

Here we use our getNode function to locate whether the value we want to remove is in the tree or not. If we found it, we delete it. Otherwise, we just return false.

2. Determine the condition of the node

When our delete function is passed a node, the first thing it needs to do is determine what type of node it has. The node can be a leaf, have one child, or be full. Here’s what it looks like to check for that:

Fig 15: delete based on state

So far so good. We can easily fill out the first if statement; if the node is a leaf, we can simply set it to null. However, we’ll also need to set the parent’s pointer to null. Remember, in pointer data structures, clearing the pointer that shows where a piece of data is is often more important than clearing the thing itself.

Fig 16: how to delete a leaf

Lines 3&4 are making sure we aren’t deleting the root — if so, we don’t need to clear the parent node. Otherwise, we check if we are the right or left child of the parent and clear that pointer. Clean and simple.

Moving on, what if our node has a single child? We can’t simply delete the node in that case — we’d lose the entire branch! Luckily, it’s not too hard to deal with. Since we’re removing a single node, and that node only has a single child, we can just graft the child onto the node’s parent, or (more easily) overwrite our current node with the information of its child, effectively deleting it. Let’s look at the code:

Fig 17: how to delete a node with one child

These two statements are practically identical, so we’ll only go through the first one. The first thing we do is save off the lone child node. This ensures that we can’t lose it if we, say, overwrite the pointer on the node we’re deleting.

Then, we overwrite all the properties of the node we’re deleting with those from the child except for the parent pointer. We keep the original parent pointer, effectively attaching the child to the parent.

However, since the node now lives in a new place in memory, we’ll need to make sure its children know where their new parent is. For left and right, we check if the child is there, then attach it to our overwritten node if so.

Fig 18: visualization of node deletion where node has one child

Now, our only remaining case is the most daunting: deleting a node with two children. We can’t simply bump the child up — both children are either lesser or greater than the parent node, and therefore would both need to inhabit either the left or right node of said parent. No, this case requires a bit more finesse. Let’s start with the code, then explain what it’s doing:

Fig 19: deleting a full node

Short code, but dense. We’ll go through it line by line.

Line 2: we start by finding the leftmost value of the right child. This one is worth thinking about for a second, but makes perfect sense. As we discussed earlier, the leftmost value of a binary search tree is the minimum value of that tree. As we also noted, any subtree of a binary search tree is, itself, a binary search tree. By finding the leftmost value of the right subtree, we find the smallest value, and therefore the value closest to the value of the node we’re going to delete.

Line 3: we then overwrite the current node with only the value of the right node. We have now effectively removed the value we wanted to get rid of, but its right child is now duplicated. We need to do something about that.

Line 4: we finish by recursively calling delete on the right node. This solution bears thinking about for a second: if the child has only one child it will shift it upward, attaching that child the deleted node’s parent. If the child is a leaf, it will simply be plucked. If that child also has two children, this process will repeat until the tree can be put together properly. Since we’re moving downward in the tree each time, even in a full tree we are guaranteed to eventually hit a leaf.

Fig 20: Deletion of a full node

Major operations aside, we can now do fun functions. You can build whatever utility you want; programming is all about independence and building the tools you need. Here’s a few ideas to get your started:

Remove the Minimum/Maximum

Fig 21: removing highest and lowest (push/pop for trees)

Now that we can delete by node, all we need to do to remove the largest/smallest leaf from a tree is call delete on the leftmost or rightmost node.

To Array

Fig 22: converting a tree to an array

Using a breadth first search, we can put all the elements of a tree into an array in row order. This is pretty useful for testing, as it gives you a much simpler comparison case than having to manually check the whole tree every time.

Conclusion

Binary search trees are a convenient way of storing information, and can be used to not only store data, but order it as well. This is important: linked lists store information, but so do hash tables (which have an O(1) lookup speed). Trees can store information, but in doing so they also provide a hierarchy. This is what makes them powerful.

For those interested, my complete code (with a testing suite) is in the sources. Feel free to check it out, play around, and add some functions of your own. Or, better yet, code one of these up from scratch. There’s no substitute for experience.

Sources

--

--

Rylan Bauermeister
The Startup

Software engineer and writer who likes weird edge cases and talking about the world.