# Tree-Set data structure in C++

Do you know what is a ** binary search tree** and the concept of

**? In this article we are going to build a binary tree with a set properties, self balanced (**

*set in mathematics*

*AVL***and it will be able to carry any data type (from built-in to your own objects) using the**

*)***.**

*super feature in C++ templates*Trees can be complex data structures, most of functions that can be performed on trees require **recursive calls**. This exercise will help us to train our brain to think recursively, understand how a binary tree class is built, how an *AVL* tree is self balanced and the different reasons to use a binary tree over other data structures including a simple the complexity analysis.

A tree is considered a Binary Tree when each node has at most two sub-nodes (children) in the entire tree, so that, one sub-node that does not exists normally is pointed to NULL. A complete binary tree is when all levels of the tree are completely filled but the last level is not. Some terms of Trees are: **Edge** (link between two nodes), **Child** and **Parent** node, **Root** (topmost node and parent of all child nodes) and **Leaf** (node with no child node).

To begin, we are going to define the class DoublyNode for binary search tree, it means that each node can have at most 2 child, in other words each node of the tree is composed by 2 pointers (** left:prev or right:next**) and a data

**, and its member variables and function definitions.**

*value <Type>*In the DoublyNode class, a header file *“DoublyNode.h”* will contain *#pragma* once and a template to make the node flexible to carry different data types **template <class Type>**, the class definition with a private member variable *Type data* and two public pointers Type (*left and right*) that will be initialized to NULL (** nullptr for C++11**) in the aggregate C++11 form

**“Type *left { nullptr }, *right{ nullptr };”**.

Why using ** #pragma** once or

**#ifndef DOUBLYNODE_H #def DOUBLYNODE_H #endif**? this is added to include once in a single compilation the file. All the member functions are public, two constructors (one with no parameters and one with a

**Type value**parameter), a

*setter*to set a value to the node and a

*getter*of the node key value following

**.**

*OOP encapsulation concept*The class DoublyNode and the *.cpp* file in lines 5–13 there are the two constructors, in this case we only have a constructor with parameter (const Type &data) definition, where the data value that is passed as an argument is assigned to the key of the node, and then the pointers right and left are assigned to { **nullptr** }.

In the class definition (DoublyNode.cpp file) the member function definitions are very simple and easy to understand, these methods are: *getters*, *setters,* and finally, a *destructor* that assigns both left and right pointers to NULL in order to avoid pointer dangling.

The Tree-set implements a **Binary Search Tree** **(BST).** Why a Binary Search Tree? BST are collections that maintain a dynamically changing data-set in sorted order, unlike a sorted array that its elements can’t be inserted and removed efficiently, a BST is useful for many tasks because it enables binary search to efficiently locate elements and insert elements in sorted order. BSTs have O(h) worst-case run-time complexity where “h” is the height of the tree.

The *height (h)* of the tree is the number of edges between the tree’s root and its furthest leaf. For the image above case the complexity run-time for searching, inserting and deleting is **O(log n)** *log base 2 (binary) [log 7 = 2]*.

A BST have four properties: each node of the tree contains one and only one key, the key values of the *left-subtree* are **less** than its common parent key value and the key values of the *right-subtree* are **more** than its common parent key value (recursively defined), and ultimately, duplicate key values are not allowed, following the mathematics definition of **SET{}**.

The class declaration of our tree is found in the file TreeSet.h, where we can find first a pre-compiler file declaration *#pragma once, *** template <class Type> **declaration, member methods, and finally, member variables declarations. To begin, the member variables are private (lines 9 and 10): int length (stores the number of nodes in the Tree),

**(head of the Tree).**

*DoublyNode<Type> Root*In the public member methods we can find the basic data structure methods to perform basic operations (insert, delete, search and clear). The private member methods are recursive methods that are performed internally in the Tree. Why do we need *recursion* in a Tree-Set? a tree is a ** recursive data structure**, this means that a Tree is partially composed of smaller or simpler instances of the same data structure.

The first function to define is INSERTION. The method ** “void insert(const Type & data)”** or

**inserts a new node in the tree and is public, in other words, it is accessible from outside of the class, the parameter can be a value of the Type TreeSet<Type> or a Node which was already created. Inside the method there is a call of a private recursive method (**

*“void insert(const DoublyNode<Type> & newNode)”***), this method iterates over the tree comparing the data value with nodes and moving according to the BST property of distribution(less than move to left, more than move to right), until it hits a leaf node, and then the new node is added as a child node of the leaf node.**

*insertRecursively*In the image above, to insert 40, it must transverse the Tree comparing its value with the other nodes: **1)** *40 is less than 100, move to the left,* **2)** *40 is more than 20 move to he right,* **3)** *40 is more than 30, and 30 is a leaf node, add 40 to the right*. Note that all values on the left side of the root are less than 100.

This process is done recursively passing the node and the data to be compare and calling the method again depending if it is less than or more than to move over the tree. The worst case time complexity of search and insert operations is O(h) where h is height of Binary Search Tree.

In the image above we can find the recursive method to insert a new node, the base case is when the current node position is equal to NULL, and the new node is added, if not, there are two more recursive calls, by using this calls we transverse the tree comparing the current node with the new node and move forward according BST properties.

The insertion method of a Binary Search Tree can have worst case complexity of searching, insertion and deletion O(n), this happens when a BST is not balanced and it’s referred as a degenerate Binary Search Tree.

In the image above, all those binary search trees are inefficient because they are not a complete binary tree, in other words, every node or most of them have only one child, to prove this, we choose the first BTS in the image, to insert 5, it will take O(h), and the height is 5 same number of nodes, meaning that the process of insertion, deletion and searching for this case is O(n) (similar to a Linked list).

The main reason of having a BST is that every operation is performed by at most O(log n) time complexity, to accomplish this we have to define and implement the idea of balanced-Binary Search Tree.

A balanced-binary search tree is a tree that the difference of the heights of its right child(s) and left child(s) is no more than 1. As I mentioned before, a tree is a recursive data structure and the balance BST concept applies to every node and sub-node in the tree. a balanced BST keeps its height small and every root-leaf path is done in log(n) steps.

As a last approach to accomplish our data structure ** Tree-Set (AVL)**, we need to balance our tree and implement the concept of AVL Trees.

An AVL ( ** Adel’son-Vel’skii and E.M. Landis **)Tree are simple self balancing binary trees, it requires that the height of the left and right sub-trees of very node to differ by at most 1.

AVL trees perform rotations every time the tree is unbalanced due an insertion or deletion of a node.

There are two types of rotations, left-rotation and right rotations. Both left and right rotations are performed recursively, making rotations from the base case until the root of the Tree.

It is important to calculate the height of every sub-tree and the Tree in order to compare heights of the sub-trees and then perform the rotations, this method ** depthRecursively(const DoublyNode<Type> * node)** calculates the height of a given node recursively and comparing left and right sub-trees, its base case when the current node is NULL, it returns 0.

Another method is used to compare left and right heights of the sub-trees calling the method: ** depthRecursively(const DoublyNode<Type> * node) . **this comparable method returns the difference between heights and is called

*getBalance(const DoublyNode<Type> * node).*Rotation operations take constant time because few pointers are being changed, so the time complexity of an AVL insert remains as a balanced BST O(log n).

To make sure that a BST implements AVL concept after every modification of the tree, some re-balancing must be performed. This process consists of four different cases: Left-Left case, Left-Right case, Right-Right case and Right-Left case.

For the first case, the tree rotates to the right once, in the example above, node 50 is assigned as new root of the tree and its right pointer to the left now points to node 100, on the other hand, the left pointer of node 100 now points to node 65.

The Right-Right case is similar to the Left-Left case, the tree rotates to the left once getting a new root and moving pointers in constant time complexity.

The case above is more complex, but if we look at it closer, it is just a left rotation in a sub-tree and then a right rotation in the tree getting a new root. The same happens virtually in the Right-Left case.

The idea of these cases is to implement them in our recursive method of insertion.

The next operation is SEARCH in the Tree, this one is pretty simple, the method declaration in the TreeSet class header ** isInSet(const Type &data)** calls the recursive private method

**this method works in same way than insertion but it returns a bool whether data is found in the tree or not.**

*searchRecursively(DoublyNode<Type> *node, const Type &data),*the base case of search is when the current node is NULL, it returns false. if the value that is being search is found, then it returns true and it is assigned to the other calls.

Note that if you download the code from GitHub, both methods are in different locations in the **TreeSet.cpp** file.