Data Structures 101: Binary Search Trees

How to combine the efficiency of insertion of a Linked List and the quick search of an ordered array.

Kevin Turney
We’ve moved to freeCodeCamp.org/news
13 min readOct 10, 2018

--

“leafless tree on the hill” by Fabrice Villard on Unsplash

What is a Binary Search Tree?

Let’s start with basic terminology so we may share the same language and investigate related concepts. First, what are the principles that define a Binary Search Tree?

* From here on out I will use “BST” for brevity

A BST is considered a data structure made up of nodes, like Linked Lists. These nodes are either null or have references (links) to other nodes. These ‘other’ nodes are child nodes, called a left node and right node. Nodes have values. These values determine where they are placed within the BST.

Similarly to a linked list, each node is referenced by only one other node, its parent (except for the root node). So we can say that each node in a BST is in itself a BST. Because further down the tree, we reach another node and that node has a left and a right. Then depending on which way we go, that node has a left and a right and so on.

1. The left node is always smaller than its parent.

2. The right node is always greater than its parent.

3. A BST is considered balanced if every level of the tree is fully filled with the exception of the last level. On the last level, the tree is filled left to right.

4. A Perfect BST is one in which it is both full and complete (all child nodes are on the same level and each node has a left and a right child node).

Why would we use this?

What are some real-world examples of BST’s? Trees are often used in search, game logic, autocomplete tasks, and graphics.

Speed. As mentioned earlier, the BST is an ordered data structure. Upon insertion, the nodes are placed in an orderly fashion. This inherent order makes searching fast. Similar to binary search (with an array that is sorted), we cut the amount of data to sort through by half on each pass. For example, suppose we are looking for a small node value. On each pass, we keep moving along the leftmost node. This eliminates half the greater values automatically!

Also, unlike an array, the data is stored by reference. As we add to the data structure, we create a new chunk in memory and link to it. This is faster than creating a new array with more space and then inserting the data from the smaller array to the new, larger one.

In short, inserting, deleting and searching are the all-stars for a BST

Now that we understand the principles, the benefits, and the basic components of a BST, let’s implement one in javascript.

The API for a BST consists of the following: Insert, Contains, Get Min, Get Max, Remove Node, Check if Full, Is Balanced, and the types of Search — Depth First (preOrder, inOrder, postOrder), Breadth First Search, and lastly Get Height. That’s a big API, just take it one section at a time.

Implementation

The constructor

The BST is made up of nodes, and each node has a value.

The BST constructor is made up of a root node.

… so far so good.

Insertion

Let’s add some more values.

For a cool visualization Go Here!!

Let’s unpack this.

  1. First, we pass a value and create a new node
  2. Check if there is a root, if not, set this newly created node to the root node
  3. If there is a root node, we create a variable declared “current”, and set its value to the root node
  4. If the newly created node.value is smaller than the root node, we will move left
  5. We keep comparing this node.value to left nodes.
  6. If the value is small enough and we reach a point where there are no more left nodes, we place this item here.
  7. If the node.value is greater we repeat the same steps as above except we move along the right.
  8. We need the break statements because there is no count step to terminate the while loop.

Contains

This is a pretty straightforward approach.

Get Min and Get Max.

Keep traversing left to the smallest value or right for the largest.

Removal

Removing a node is the trickiest operation, because nodes have to be reordered to maintain the properties of a BST. There is a case if a node has only one child and a case if there is both a left and a right node. We use the larger helper function to do the heavy lifting.

It works like this…

Unlike deleteMin and deleteMax, where we can just traverse all the way left or all the way right and pick off the last value, we have to take out a node and then replace it with something. This solution was developed in 1962 by T. Hibbard. We account for the case where we can delete a node with only one child or none, that’s minor. If no children, no problem. If a child is present, that child just moves up one.

But with a node scheduled to be removed that has two children, which child takes its place? Certainly, we can’t move a larger node down. So what we do is replace it with its successor, the next kingpin. We have to find the smallest right child on the right that is larger than the left child.

  1. Create a temp value and store the smallest node on its right. What this does is satisfy the property that values to the left are still smaller and values to the right are still greater.
  2. Reset the node’s value to this temp variable
  3. Remove the right node.
  4. Then we compare values on the left and the right and determine the assigned value.

This is best explained with a picture:

Searching

There are two types of search, Depth First and Breadth First. Breadth First is simply stopping at each level on the way down. It looks like this: we start at the root, then the left child, then the right child. Move to the next level, left child then right child. Think of this as moving horizontally. We employ, I should say simulate, a queue to help order the process. We pass a function, because many times we want to operate on a value.

Depth First Search involves moving down the BST in a specified manner, either, preOrder, inOrder, or postOrder. I’ll explain the differences shortly.

In the spirit of concise code, we have a basic traverseDepthFirst function and we pass a function and a method. Again the function implies that we want to do something to the values along the way, while the method is the type of search we wish to perform. In the traverseDFS, we have a fallback: preOrder search in place.

Now, how is each one different? First, let’s dispatch inOrder. It should be self-explanatory but it isn’t. Do we mean in order of insertion, in order of highest to lowest or lowest to highest? I just wanted you to consider these things beforehand. In this case, yes, it does mean lowest to highest.

preOrder can be thought of as Parent, Left Child, then Right child.

postOrder as Left Child, Right Child, Parent.

Check if the BST is full

Remember from earlier, a BST is full if every node has Zero or Two children.

Get Height of BST

What does it mean to get the height of a tree? Why is this important? This is where Time Complexity (aka Big O) comes into play. Basic operations are proportional to the height of a tree. So as we alluded to earlier, if we search for a particular value, the number of operations we have to do is halved on each step.

That means if we have a loaf of bread and cut it in half, then cut that half in half, and keep doing that till we get the exact piece of bread we want.

In computer science, this is called O(log n). We start with an input size of some sort, and over time that size gets smaller (kind of flattening out). A straight linear search is denoted as O(n), as the input size increases so does the time it takes to run operations. O(n) conceptually is a 45-degree line starting at origin zero on a chart and moving right. The horizontal scale represents the size of an input and the vertical scale represents the time it takes to complete.

Constant time is O(1). No matter how large or small the input size is, the operation takes place in the same amount of time. For example, push() and pop() off of an array are constant time. Looking up a value in a HashTable is constant time.

I will explain more about this in a future article, but I wanted to arm you with this knowledge for now.

Back to height.

We have a recursive function, and our base case is: ‘if we have no node then we start at this.root’. This implies that we can start at values lower in the tree and get tree sub-heights.

So if we pass in this.root to start, we recursively move down the tree and add the function calls to the execution stack (other articles here). When we get to the bottom, the stack is filled. Then the calls get executed and we compare the heights of the left and the heights of the right and increment by one.

Lastly, Is Balanced

What we are doing is checking if the tree is filled at every level, and on the last level, if it is filled left to right.

Print

Use this to visualize all the methods you see, especially depth first and breadth first traversals.

Our Friend Console.log!! Play around and experiment.

Time Complexity

1. Insertion O(log n)
2. Removal O(log n)
3. Search O(log n)

Wow, that is indeed a lot of information. I hope the explanations were as clear and as introductory as possible. Again, writing helps me solidify concepts and as Richard Feynman said, “When one person teaches, two learn.”

Resources

Probably the best resource for visualizing, definitely use it:

--

--

Kevin Turney
We’ve moved to freeCodeCamp.org/news

Software Engineer, extreme DIY'er, husband, father, I build things.... pretty much anything.