Tree-Set data structure in C++

Image for post
Image for post

Do you know what is a binary search tree and the concept of set in mathematics? In this article we are going to build a binary tree with a set properties, self balanced (AVL) and it will be able to carry any data type (from built-in to your own objects) using the super feature in C++ templates.

Trees can be complex data structures, most of functions that can be performed on trees require recursive calls. This exercise will help us to train our brain to think recursively, understand how a binary tree class is built, how an AVL tree is self balanced and the different reasons to use a binary tree over other data structures including a simple the complexity analysis.

Image for post
Image for post
Complete Binary Tree

A tree is considered a Binary Tree when each node has at most two sub-nodes (children) in the entire tree, so that, one sub-node that does not exists normally is pointed to NULL. A complete binary tree is when all levels of the tree are completely filled but the last level is not. Some terms of Trees are: Edge (link between two nodes), Child and Parent node, Root (topmost node and parent of all child nodes) and Leaf (node with no child node).

Balanced Binary Search Tree AVL

To begin, we are going to define the class DoublyNode for binary search tree, it means that each node can have at most 2 child, in other words each node of the tree is composed by 2 pointers (left:prev or right:next) and a data value <Type> , and its member variables and function definitions.

Image for post
Image for post
Double pointer Node

In the DoublyNode class, a header file “DoublyNode.h” will contain #pragma once and a template to make the node flexible to carry different data types template <class Type>, the class definition with a private member variable Type data and two public pointers Type (left and right) that will be initialized to NULL (nullptr for C++11) in the aggregate C++11 form “Type *left { nullptr }, *right{ nullptr };”.

Image for post
Image for post
Header DoublyNode.h class declaration

Why using #pragma once or #ifndef DOUBLYNODE_H #def DOUBLYNODE_H #endif? this is added to include once in a single compilation the file. All the member functions are public, two constructors (one with no parameters and one with a Type value parameter), a setter to set a value to the node and a getter of the node key value following OOP encapsulation concept.

The class DoublyNode and the .cpp file in lines 5–13 there are the two constructors, in this case we only have a constructor with parameter (const Type &data) definition, where the data value that is passed as an argument is assigned to the key of the node, and then the pointers right and left are assigned to { nullptr }.

Image for post
Image for post
Source File DoublyNode.cpp definition

In the class definition (DoublyNode.cpp file) the member function definitions are very simple and easy to understand, these methods are: getters, setters, and finally, a destructor that assigns both left and right pointers to NULL in order to avoid pointer dangling.

The Tree-set implements a Binary Search Tree (BST). Why a Binary Search Tree? BST are collections that maintain a dynamically changing data-set in sorted order, unlike a sorted array that its elements can’t be inserted and removed efficiently, a BST is useful for many tasks because it enables binary search to efficiently locate elements and insert elements in sorted order. BSTs have O(h) worst-case run-time complexity where “h” is the height of the tree.

Image for post
Image for post
Binary Search Tree diagram

The height (h) of the tree is the number of edges between the tree’s root and its furthest leaf. For the image above case the complexity run-time for searching, inserting and deleting is O(log n) log base 2 (binary) [log 7 = 2].

A BST have four properties: each node of the tree contains one and only one key, the key values of the left-subtree are less than its common parent key value and the key values of the right-subtree are more than its common parent key value (recursively defined), and ultimately, duplicate key values are not allowed, following the mathematics definition of SET{}.

The class declaration of our tree is found in the file TreeSet.h, where we can find first a pre-compiler file declaration #pragma once, template <class Type> declaration, member methods, and finally, member variables declarations. To begin, the member variables are private (lines 9 and 10): int length (stores the number of nodes in the Tree), DoublyNode<Type> Root (head of the Tree).

Image for post
Image for post
Header file TreeSet.h class declaration

In the public member methods we can find the basic data structure methods to perform basic operations (insert, delete, search and clear). The private member methods are recursive methods that are performed internally in the Tree. Why do we need recursion in a Tree-Set? a tree is a recursive data structure, this means that a Tree is partially composed of smaller or simpler instances of the same data structure.

The first function to define is INSERTION. The method “void insert(const Type & data)” or “void insert(const DoublyNode<Type> & newNode)” inserts a new node in the tree and is public, in other words, it is accessible from outside of the class, the parameter can be a value of the Type TreeSet<Type> or a Node which was already created. Inside the method there is a call of a private recursive method (insertRecursively), this method iterates over the tree comparing the data value with nodes and moving according to the BST property of distribution(less than move to left, more than move to right), until it hits a leaf node, and then the new node is added as a child node of the leaf node.

Image for post
Image for post
insertion of 40 in a BST

In the image above, to insert 40, it must transverse the Tree comparing its value with the other nodes: 1) 40 is less than 100, move to the left, 2) 40 is more than 20 move to he right, 3) 40 is more than 30, and 30 is a leaf node, add 40 to the right. Note that all values on the left side of the root are less than 100.

Image for post
Image for post
Insert methods in the Tree-Set

This process is done recursively passing the node and the data to be compare and calling the method again depending if it is less than or more than to move over the tree. The worst case time complexity of search and insert operations is O(h) where h is height of Binary Search Tree.

Image for post
Image for post
Recursive insertion of a new node in the Tree-Set

In the image above we can find the recursive method to insert a new node, the base case is when the current node position is equal to NULL, and the new node is added, if not, there are two more recursive calls, by using this calls we transverse the tree comparing the current node with the new node and move forward according BST properties.

The insertion method of a Binary Search Tree can have worst case complexity of searching, insertion and deletion O(n), this happens when a BST is not balanced and it’s referred as a degenerate Binary Search Tree.

Image for post
Image for post
Inefficient BSTs

In the image above, all those binary search trees are inefficient because they are not a complete binary tree, in other words, every node or most of them have only one child, to prove this, we choose the first BTS in the image, to insert 5, it will take O(h), and the height is 5 same number of nodes, meaning that the process of insertion, deletion and searching for this case is O(n) (similar to a Linked list).

The main reason of having a BST is that every operation is performed by at most O(log n) time complexity, to accomplish this we have to define and implement the idea of balanced-Binary Search Tree.

Image for post
Image for post
Balanced BST

A balanced-binary search tree is a tree that the difference of the heights of its right child(s) and left child(s) is no more than 1. As I mentioned before, a tree is a recursive data structure and the balance BST concept applies to every node and sub-node in the tree. a balanced BST keeps its height small and every root-leaf path is done in log(n) steps.

Image for post
Image for post
AVL Tree definition k = height

As a last approach to accomplish our data structure Tree-Set (AVL), we need to balance our tree and implement the concept of AVL Trees.

An AVL ( Adel’son-Vel’skii and E.M. Landis )Tree are simple self balancing binary trees, it requires that the height of the left and right sub-trees of very node to differ by at most 1.

AVL trees perform rotations every time the tree is unbalanced due an insertion or deletion of a node.

There are two types of rotations, left-rotation and right rotations. Both left and right rotations are performed recursively, making rotations from the base case until the root of the Tree.

Image for post
Image for post
Recursive rotations

It is important to calculate the height of every sub-tree and the Tree in order to compare heights of the sub-trees and then perform the rotations, this method depthRecursively(const DoublyNode<Type> * node) calculates the height of a given node recursively and comparing left and right sub-trees, its base case when the current node is NULL, it returns 0.

Image for post
Image for post
Heights calculation and comparable

Another method is used to compare left and right heights of the sub-trees calling the method: depthRecursively(const DoublyNode<Type> * node) . this comparable method returns the difference between heights and is called getBalance(const DoublyNode<Type> * node).

Rotation operations take constant time because few pointers are being changed, so the time complexity of an AVL insert remains as a balanced BST O(log n).

Image for post
Image for post
left-left rotation AVL tree

To make sure that a BST implements AVL concept after every modification of the tree, some re-balancing must be performed. This process consists of four different cases: Left-Left case, Left-Right case, Right-Right case and Right-Left case.

Image for post
Image for post
LEFT-LEFT case

For the first case, the tree rotates to the right once, in the example above, node 50 is assigned as new root of the tree and its right pointer to the left now points to node 100, on the other hand, the left pointer of node 100 now points to node 65.

Image for post
Image for post

The Right-Right case is similar to the Left-Left case, the tree rotates to the left once getting a new root and moving pointers in constant time complexity.

Image for post
Image for post
LEFT-RIGHT case

The case above is more complex, but if we look at it closer, it is just a left rotation in a sub-tree and then a right rotation in the tree getting a new root. The same happens virtually in the Right-Left case.

Image for post
Image for post
RIGHT-LEFT case

The idea of these cases is to implement them in our recursive method of insertion.

Image for post
Image for post
Complete recursive insertion method

The next operation is SEARCH in the Tree, this one is pretty simple, the method declaration in the TreeSet class header isInSet(const Type &data) calls the recursive private method searchRecursively(DoublyNode<Type> *node, const Type &data), this method works in same way than insertion but it returns a bool whether data is found in the tree or not.

the base case of search is when the current node is NULL, it returns false. if the value that is being search is found, then it returns true and it is assigned to the other calls.

Image for post
Image for post
Search recursively and isInSet methods

Note that if you download the code from GitHub, both methods are in different locations in the TreeSet.cpp file.

Image for post
Image for post
Search method instance 75

To download the file go to my Github.

M.S. Computer Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store