Augmented Binary Search Trees
K-th Smallest Element
Unless you have been living under a rock, you must have heard of the term Augmented Reality. But do you know the exact meaning of this term?
In computer terminology, “augment” translates to addition of data or information.
Augmented reality is the addition of data to reality for some specific purpose be it navigation, education or just entertainment. Similarly, we can add some data to every node in a binary search tree to make some algorithm run faster than on a normal BST. The binary search tree along with this new data is known as an Augmented Binary Search Tree. Here, we’ll learn about one such tree.
What is a Binary Search Tree?
Binary Search Tree is a type of data structure which stores data in the form of nodes and linkages. Each node may be connected to two other nodes, one on the left and one on the right. The linkages always satisfy the property that the left node is always smaller and the right node is always greater in value than their parent node. You can read more about them here.
Understanding the Problem
I give you an unsorted array of numbers and ask you to find the 5th smallest element and then the 9th and then the 17th and so on.
Now, I ask you to tell me if 45 is a value stored in the array and if it is, tell me how many elements are smaller than it? How would you do it?
The naive solution would be to sort the array and search the element in it. This way you have each element and its rank but insertion or deletion would take time proportional to the number of elements. There’s a smarter way to go about it by using augmentation. Augment each node with the size of its left sub-tree. Now, both the problems can be solved in O(logn) time. Let us discuss how?
Class Structure
I will be coding in C++ programming language. I will also use templates in our code to make it more generalized for different data types. Our tree class will have a subclass called node which as the name suggests, defines all the nodes.
- The node class will have three node pointers namely, left, right and parent, same as that in a normal BST.
- It will also store a key and an integer nleft for storing the size of the left sub-tree.
- The tree class will just have a node pointer root and an integer n for the size of the tree.
- We also add the public functions which will call their private utilities.
Insertion Utility
Recursion makes this piece of code really simple. We will create a function that will return the head node after insertion. Similar to a BST, when you reach a NULL pointer return the newly created node; if the inserted value is less than the key of the current node, insert in the left sub-tree else insert in the right sub-tree. The only change is when you insert, trace back to the root, whenever insertion is in the left sub-tree, increase nleft by one.
Remove Utility
Similar to the insert utility, this is a recursive function which returns the head node after deletion. Search for the node to be deleted, if found, check its children. If any of the children do not exist, simply bypass the node while if both the children exist, replace the key with that of the leftmost node in the right sub-tree and delete that node. After deletion, retrace the path to root and decrease the value of nleft whenever remove utility was called on the left sub-tree.
Kth Smallest Element
Ahh… Finally, the functions for which we went through all this trouble for. Do you get an intuition for why storing the size of left sub-tree helps in finding the rank of an element?
See, nleft signifies the number of elements that had followed the same path as the current node while insertion but are smaller than the current node’s value. While searching, whenever we move to the left sub-tree, we know that the ranks of the elements in that tree are less than nleft of the current node. Also, when moving to the right sub-tree, we know that there are nleft smaller elements and ranking in the right tree would start from nleft+1. This is it: Do nothing while going left and subtract nleft+1 from k while going right. Return when rank becomes nleft+1.
The rank of an Element
Similar to the previous function, we can use the same concept to get the rank of an element. While searching for the element maintain a running sum of nleft+1 of the nodes you had to move to the right sub-tree on. When you find your node, nleft + 1 + running_sum is the rank. This comes from the fact that you have counted every element that is smaller than the element being searched for.
Closing Thoughts
The algorithm discussed above brought down the time complexity of finding the rank from linear to logarithmic by just storing one integer per node. This is just one example of the numerous types of augmented binary search trees which stores different data for each node, such as the AVL trees store node’s height, Red-Black trees store node’s color, etc. The possibilities are limitless. You can download the code for this article from my Github repository. Also, check out this AVL tree implementation, which explains the basic working of that tree.
