Binary Search Tree Operations In Data Structures
In a Binary Search Tree (BST), we first evaluate the root of the nodes. The node key values are greater on the Right Subtree, while the Left Subtree are lesser in value. This will depend on what the root key value is set to. In the example given, let us assume that the root key value of the node is 50 or R(50). Any key value to the Left Subtree is < 50. The key values on the Right Subtree are > 50. This can be represented in the following example illustration.
With a root of 50, any new key value < 50 will fall under the Left Subtree and any new value > 50 will fall under the Right Subtree. If a new key value was 40, according to the rule, the node will fall under the Left Subtree since it is:
N < 50 -> LS
This creates a data structure with an organization among nodes. The root is the ancestor of the leaf nodes that are connected to it. In the example illustration, the node N(40) is related to R(50) by way of N(30). It is a leaf node or children nodes of N(30) as well.
According to definition:
“The order of nodes in a BST means that each comparison skips about half of the remaining tree, so the whole lookup takes time proportional to the binary logarithm of the number of items stored in the tree.”
In terms of Big O Notation time complexity for BST, when it comes to operations like Search, Insert and Delete the average or best case is O(Log n) and the worst case would be O(n). Logarithmic time or O(Log n) is actually a more efficient and faster way to perform operations in large data sets. With a root R(50) let us say in a 100 node key value data set, you are reducing the search requirements in half by knowing under which part of the tree to lookup a given value. From there you can iterate the node values until you find the one you are looking for.
This puts the key value of the nodes in an organized manner. This is for faster lookups to find a key value among N number of nodes. Let us say we have 1,000 nodes and root value was 500, therefore in a BST if you were to search for the node key value of 300, it would fall under the Left Subtree as a leaf among the N nodes.
The following steps are the methodology for BST insertion of a key value:
- Start by evaluating the key value from the root.
- If it is less than the root, then recurse to the left.
- If it is greater than the root, then recurse to the right.
- Finally, insert the node where you recurse.
Here is a simple program written in Python for inserting key values:
# Create a class that represents an individual node in a BSTclass Node:def __init__(self, key):self.left = Noneself.right = Noneself.val = key
# Function to insert a new node with the given keydef insert(root, key):if root is None:return Node(key)else:if root.val == key:return rootelif root.val < key:root.right = insert(root.right, key)else:root.left = insert(root.left, key)return root
# Function to do inorder tree traversaldef inorder(root):if root:inorder(root.left)print(root.val)inorder(root.right)
# Test the functionsr = Node(50)r = insert(r, 30)r = insert(r, 20)r = insert(r, 40)r = insert(r, 70)r = insert(r, 60)r = insert(r, 80)# Print the node valuesinorder(r)
The output of the program should sort the node key values according to a sorted order list:
The Left Subtree is represented in the list by all values before 50, while all Right Subtree values are at the bottom of 50. Likewise, it can be written as:
/ \ / \
20 40 60 80
You can search and delete node values following a similar methodology. First compare the root with the given value, then iterate or look at the subtree parents of the given value if it is a leaf node.
Going back to time complexity, the worst cases scenario of search and insert operations is O(h) where h is the height of the BST. It may require having to travel from root to the deepest leaf node, which can take much longer to do. It may also become linear to the O(n), where n is the total number of nodes to iterate within a data set. For example, even if you were able to halve the data set, you may not find a key value fast enough if it is not efficiently organized.
Let us take another illustrated example that exemplifies the worst case.
If we are searching for the key value of N(10), we would be able to find it in the following steps:
- Start from R(50)
- Look under Left Subtree N(30)
- From parent N(40)
- Move from left to write (unsorted) to search for node value (O(n)) starting from N(15) … until N(10).
R(50) -> N(30) -> N(40) -> N(15) -> N(22) -> N(10)
We have to iterate each child of N(40) or its leaf nodes to reach N(10). Now let us change it to somehow make it more efficient to search for N(10).
By re-arranging the order, it takes a lesser number of steps to N(10). You will just need to travel from R(50) to N(30) to N(40) to N(45). Once you reach N(45) your search will return the lookup for N(10). Previously you had to travel from R(50) to N(30) to N(40), then do 3 iterations from N(15) to N(22) before reaching N(10).
R(50) -> N(30) -> N(40) -> N(45) -> N(10)
It also depends on the number of elements under the parent or child node. Of course, this arrangement will not be necessary if N(10) were the only child of N(40). The new arrangement saved us 1 extra step, but if you look at this at a grander scale of thousands or millions of elements, then any way of shortening the process will be worth it.
Here is the fastest way to lookup N(10) in the next illustration.
We further reduce the process from the root to just 3 steps total.
R(50) -> N(30) -> N(40) -> N(10)
A BST can also be used for data verification, for provability. The key here is the root (ancestor) and parent of a key value of the leaf nodes. When you can trace the origin of a node to an ancestor, that proves the data is correct (e.g. a family tree). Such applications can be used to quickly verify if a particular node belongs to a data set.