Data Structures for Dummies: Binary Search Trees

Ramy Zhang
9 min readMar 29, 2018

--

Let’s say you were programming a database to hold the basic information of a country’s citizens — so about 100 million data entries. If you were to use a linked list or an array to store this data, it would take linear time O(n) to search for an entry, constant time O(1) to insert a new entry, and linear time O(n) to remove an entry. (For more information on how big O notation works, read my previous article in this series on Linked Lists! I also cover most of the basic concepts that will be used in this article as well.)

Let’s think about the cost of performing a search function on a sorted array (sorted means that the data has been arranged in chronological order). If it takes you about a second to compare 1 million entries, that means that in the worst case scenario, it’ll take you about 100 seconds to search the entire database. This is much too long for a basic action that will probably be performed many times daily. How do we shorten this time?

In this article we’ll explore binary search trees, their basic structure, their characteristics, and how you can code one yourself. We’ll also try to understand how they can be a possible solution to this problem.

Structure

Example of a binary search tree. Source

A binary search tree is a binary tree in which all the children nodes on the left subtree of a root node (which is the first node at the top of the tree) have lesser values, and all the children nodes on the right subtree of a root node have greater values.

So following the above example, we see that our root node is 25. The left child node has a value of 20, so less than 25, and the right child node has a value of 36, so more than 25. As we continue down the tree, all the child nodes on the left side must have a smaller value than their mother node, and all the child nodes on the right side of each node must have a bigger value than their mother node. Note that each node can only have a maximum of two child nodes. The last nodes at the bottom of the tree that have no children are called leaf nodes. Pretty simple so far!

Searches

Let’s start off with a simple example.

Using the same binary search tree as in the image above, if we wanted to search for the value 38, we’d start by asking ourselves: is 38 bigger or smaller than 25? Since it’s bigger, we can cut out the entirety of the left subtree under the root node of 25 (because those nodes will all be smaller than 25) and proceed directly to the child node of 36. Then we ask ourselves again: is 38 bigger or smaller than 36? Since it’s bigger, we continue again to the right child node and cut out the entire left subtree under the child node of 30. We continue this until we find 38.

Searching a binary search tree vs. searching a sorted array. Source

Because of the way binary search trees are organized with the left subtree smaller than the root, and the right subtree bigger than the root, at each step you’ll find yourself cutting out about half of the entire database. The amount of data you’ll need to search at each step therefore decreases logarithmically, as compared to linked lists where each new step will only decrease your search space by one node! You can just imagine how much faster it is.

Insertions

Inserts work very similarly to how searches work! Take a look at this example:

Insertions in a binary search tree. Source

As always, we start by comparing the value we wish to insert to our root node, checking whether it’s bigger or smaller. For example, if we want to insert 28, we see that it is bigger than the root of 21, therefore we’ll set it as the child node to the right of 21 (seeing as the root doesn’t have any child nodes yet). Continuing on, if we want to insert 14, we see that it is smaller than the root of 21, therefore it’ll be set as its left child node. As the tree grows, the inserts will continue to be made in this fashion, checking the value we want to insert against each node to see whether it’s bigger or smaller, and taking the next step accordingly. For this reason as well, insertions will take a logarithmically shorter time of O(log n).

It’s important to make sure the tree is balanced on both sides of the root node for this reason, i.e. there is roughly the same amount of nodes on either side. If you continue to pile on values that are bigger than the root node, for example, then your tree will become heavily skewed to the right side and will look more like a linked list than anything else. This will invalidate the O(log n) insertion and search time; at each step you are no longer cutting out half of the search space because the tree isn’t “split in half” at the center. However, fear not; there are a couple of algorithms out there that will be able to help you re-balance your tree when the time comes!

Traversals

You can think of traversals as “walking” through the tree. When it comes to binary search trees, there are three ways to do this: pre-order traversals, in-order traversals, and post-order traversals.

The three different kinds of traversals. Source

In pre-order traversals, you start with your root node, then you print the left node, and then after that the right node. So in this way, you’ll have the median value print first, then the smallest value, and finally the biggest value.

Post-order traversals go the opposite way; you print the left child value first, then the right child value, and lastly the root node. So this means that we’ll go from the smallest value, to the greatest value, and end with the median.

In-order traversals are used the most often because they print the values in a binary search tree in chronological order, as they go from the leftmost child node, to the root node, and then finally to the right node. Therefore you start with the smallest value and end with the largest value.

Code it!

So now that we’ve got the basics down, as promised let’s code a binary search tree ourselves to better understand the mechanisms behind it!

class Node {
Node left, right;
int data;
public Node(int data) {
this.data = data;
}
}

Remember how I said before in the Linked Lists article that nodes were essentially structs that contained two variables; your data and a special pointer towards the next node? That is exactly what we’re creating here. So in the second line, Node left, right indicates that we’re creating a pointer towards the left child node and the right child node. Afterwards, we have our variable that holds data of type integer (this will be the data in the root node), as well as a constructor function that will allow us to set data values more easily later on.

class Node {
Node left, right;
int data;
public Node(int data) {
this.data = data;
}

public void insert(int value) {
if (value <= data) {
if (left == null) {
left = new Node(value);
} else {
left.insert(value);
}
} else {
if (right == null) {
right = new Node(value);
} else {
right.insert(value);
}
}
}
}

In this new chunk of code, we’re detailing an insert method. The function takes in a certain value that we want to insert. We then check if the value is less than the one in the node we’re comparing it against. If it is smaller, then we move into the next step; we check whether the left child node is empty or not (remember, the left child nodes always contain the lesser values, the right child nodes always contain the greater values!). In the case that the left node doesn’t exist yet (null), we create a new node in that position with the value we wanted to insert. If it does exist (else), then you push it down the tree on the left side and you ask the left child node to continue the job of inserting your value.

Then, in the case that the value we want to insert is actually greater than that of the node we’re comparing it against, we want to check if the right child node exists or not. We continue the same process; if the right node does not exist (is null), then we create a new node in that position with our value. If not, we push it down the tree on our right side and ask the right node to continue inserting the value.

class Node {
Node left, right;
int data;
public Node(int data) {
this.data = data;
}

public void insert(int value) {
if (value <= data) {
if (left == null) {
left = new Node(value);
} else {
left.insert(value);
}
} else {
if (right == null) {
right = new Node(value);
} else {
right.insert(value);
}
}
}
public boolean contains(int value) {
if (value == data) {
return true;
} else if (value <= data) {
if (left == null) {
return false;
} else {
return left.contains(value);
}
} else {
if (right == null) {
return false;
} else {
return right.contains(value);
}
}
}

}

The next method is a search method, which we will call contains(). Firstly, we check if the value is equal to that of the node we’re looking at right now. If yes, then simply return true; the node you’re looking at corresponds to the value you’re looking for! Otherwise, if the value you’re looking for is smaller than that of the node you’re checking, we then see if the left child node under it exists. If it doesn’t and is null, then return false; the value you’re looking for doesn’t exist in the tree! If the left child node does exist, then simply ask the left child node what the answer is. Since this is a recursive data structure, the process will continue on through this function until you finally hit the value you’re looking for.

We repeat the exact same process when the function tells us that our value is greater than that of the node we’re looking at; except this time we continue on through the right child node instead.

class Node {
Node left, right;
int data;
public Node(int data) {
this.data = data;
}

public void insert(int value) {
if (value <= data) {
if (left == null) {
left = new Node(value);
} else {
left.insert(value);
}
} else {
if (right == null) {
right = new Node(value);
} else {
right.insert(value);
}
}
}
public boolean contains(int value) {
if (value == data) {
return true;
} else if (value <= data) {
if (left == null) {
return false;
} else {
return left.contains(value);
}
} else {
if (right == null) {
return false;
} else {
return right.contains(value);
}
}
}
public void printInOrder() {
if (left != null) {
left.printInOrder() {
}
System.out.printInOrder(data);
if (right != null) {
right.printInOrder() {
}
}

}

The last method is an in-order traversal print of all the values! This method is actually super simple; we start by making sure the leftmost child node exists, and we proceed to print it. Then, we print our data (in other words, the root node in this situation), and lastly, we make sure that the rightmost child node exists and we print it.

Congratulations! You’ve coded your very own binary search tree. If you haven’t done so yet, be sure to also check out the previous article in this series on Linked Lists for some more valuable foundational knowledge as well!

Thank you for reading! If you found this helpful, here’s some next steps you can take:

  1. Send some claps my way!
  2. Follow me on Medium and connect with me on LinkedIn to be updated when next article of this series is out!
  3. Get some more practice by doing some problems with binary search trees!
  4. Check out the next article in this series on stacks and queues!

--

--