Demystifying Trees in Data Structures — Part 1

karthik Rajkumar
16 min readMar 30, 2020

--

Let’s see some of the different types of trees available, its implementation, its complexity, and its real-world use cases.

Note: we can split this article into parts, so we can avoid ending up into a bigger article.

Types of Trees available

  1. Binary Tree
  2. Binary Search Tree
  3. Red Black Tree
  4. AVL Tree
  5. Splay Tree
  6. Heap
  7. B Tree
  8. B+ Tree
  9. 2–3 Search Trees
  10. 2–3–4 Search Trees or (2–4 Search Trees)
  11. N-ary (K-ary, M-ary) Tree
  12. k-D Tree
  13. Treap (Binary Search Tree + Heap)
  14. Van Embe Boas Trees
  15. Trie (Digital Tree or Prefix Tree)
  16. Suffix Trees
  17. Huffman Tree
  18. R Tree
  19. Counted B Tree
  20. Merkel Tree
  21. Decision Tree
  22. Fenwick Tree (Binary Index Tree)
  23. Range Trees

1. Binary Tree

Binary Tree is a tree that has at most 2 children (a left node and a right node). The tree node has data and pointer to the left node and pointer to the right node.

Node.java
BinaryTree.java
Implementing Binary Tree

There are 3 different types of binary Tree

Full Binary Tree

A Binary tree is said to be a full binary tree if the node has 0 or 2 children.

Here all four representations are said to be ‘full binary tree’

Complete Binary Tree

A binary tree is said to be a complete binary tree if all the levels are completely filled except possibly the last level.

Complete Binary Tree.

Here both the representations denote ‘complete binary tree’.

Perfect Binary Tree

A binary tree is said to be ‘perfect binary tree’ if all the leaves are at the same level.

Perfect Binary Tree

Here both the representations denote ‘perfect binary tree’

Skewed Tree

A Tree where every node has only one child then it is termed as a skewed tree.

Skewed Binary Tree

To find the height of the binary tree.

Height of the Binary Tree
Size of the Binary Tree

Properties of Binary Trees

Property 1: The number of nodes n in the fully binary tree is at least n = 2h+1 and at most n = 2^(h+1) -1 where h is the height of the tree.

a. Let’s determine the number of nodes at the root level.

2*h+1 => 2*0 + 1 = 1 (Number of nodes available in the root node is 1 as height is 0)

b. Let’s determine the maximum number of nodes at the height 3

2^(h+1) -1 => 2⁴ -1 => 16 - 1 = 15 nodes.

Property 2: The maximum number of nodes at any level ‘L’ in a binary tree = 2^L

a. Let’s determine the maximum number of nodes at level 2.

2² = 4 (Maximum number of nodes at level 2)

property 3: The number of nodes in the binary tree is (2^l) -1, where l represents leaf nodes.

If the binary tree has 3 leaf nodes then the number of nodes available 2³ -1 = 7 nodes.

2. Binary Search Tree

A ‘Binary Search’ algorithm falls under the typical Divide and Conquer paradigm, and the data structure associated with it is called the Binary Search Tree. We can use the search tree for both the dictionary and as a priority queue.

A tree is said to be a Binary Search Tree (BST) if it satisfies 3 conditions.

  1. A BST must be a binary tree (with at most 2 nodes)
  2. The value of the node in the left is lesser than or equal to the parent node. Left Child ≤ Parent
  3. The value of the node in the right is greater than or equal to the parent node. Right Child > Parent

We can perform the following basic operations in BST with O(h) time complexity. O(h) is Big O of the Height of the Binary Search Tree.

Performing MINIMUM, MAXIMUM, SEARCH, SUCCESSOR, PREDECESSOR, INSERT, DELETE and TRAVERSALS (In-order, Pre-order, Post-order). Let’s look into those operations with the code.

i) MINIMUM

Since it is the Binary Search Tree and we know that the value of the key which is lower than the root node will go to the left node based on the conditions we saw above. So iterating left nodes in the binary tree will eventually get the lowest value in the Binary Search Tree.

Finding Minimum in the Binary Search Tree

Here, the above diagram represents BST. To find the minimum is if you navigate to the left and will find the key ‘5’ which is the minimum in the BST. Now we shall look at some code associated with finding a minimum. For every operation, Node has 3 attributes left, right and parent.

Node.java
Finding Minimum (recursive)

This method will walk down the BST towards the left to finds its minimum value.

ii) MAXIMUM

Finding the highest number in the BST is similar to finding the lowest but we will have to walk down the BST towards the right node. Since it is the BST the highest value will always be the right node of any given node.

Finding the maximum in the BST.

Here in the above example, the key 15 is the highest and we can find the maximum by the following code.

Finding Maximum

iii) SEARCH

Searching a key in the BST is similar to the Binary Search algorithm in an array. The search starts from the root, compares the given key with the root if the key is lesser than the key associated to the root then we shall navigate to the left of the root and vice versa till we find the key.

Finding a key in BST.

Before we look into Successor, Predecessor, Insert and Delete operations, we shall see the traversals. There are three traversals In-order, Pre-order, Post-order

iv) IN-ORDER Traversal

In order traversal prints the left node, then the root node and then the right node.

v) PRE-ORDER Traversal

Prints the root node then the left node and then the right node.

vi) POST-ORDER Traversal

Prints the left node, right node and then the root node.

Now let's see the code for all the three traversals.

Traversals of BST

vii) SUCCESSOR

In order to find a successor in a BST, there are 2 possible scenarios. We shall see one by one. Finding the successor of the node can be determined by the In-order tree walk.

Scenario 1:

If there is a right child to a node, then walking down the subtree of the right child in the left direction will get you a successor.

Scenario 2:

If there is no right child to a node, then we should find the ancestors and the condition is the node should not be its parent's right child, if yes then go up, if not then that node is the given node’s successor. Let's look into the diagram for better clarity.

Finding the successor for node 15 — Scenario 1

In the above image, we are going to find the successor for node 15, this is scenario 1, where the right node is available.

Finding the successor for node 13 — Scenario 2

Here in the above image, we are going to find the successor of the node with the value 13, here for the node with the key 13 there are no children so we go for the scenario 2 which is walking up the node (parent) and check if the node is the parent’s right node. If yes then proceed to one step higher if not that node will be the successor.

viii) PREDECESSOR

Same as the Successor, for Predecessor also there are 2 scenarios. Finding the predecessor of a node can be determined by the In-Order tree walk.

Scenario 1:

If the node has a left subtree, then finding the maximum in the left subtree becomes the predecessor for the given node.

Scenario 2:

If there is no left subtree, then we have to walk down from the root to find the given node and the predecessor becomes a node from which we took the last right during traversal. Let’s look in detail.

Scenario 1: Finding the predecessor for the node with the key 15

Here we are going to find the predecessor for the node 15. There is a left subtree available. We navigate to left subtree and now we shall find the maximum from the left subtree.

Scenario 2: Finding the predecessor of node 78 which is 70

Here the In-Order (traversal) Predecessor of the node with the key 78 is the node with the key 70. we start from the root node and navigate from the top to find our given node, now the predecessor node will be from which node it took the last right (node with the key 70).

Code for the in-order Predecessor

ix) INSERT

Now we can see how we can insert an element into the BST. This will change or modify the binary search tree but it should change by not affecting the properties of the BST.

Basically inserting is a two-step process

  1. Finding the exact location
  2. Inserting the node

x) DELETE

Delete Operation is a bit complicated and it involves four different possibilities. We are going to modify the data structure and make sure that the conditions of the BST are not changed.

Scenario 1.

If there is no left subtree and we need to delete the node Z then simply delete the node Z and replace it with the node Y.

Scenario 2

If there is no right subtree for node Z to be deleted then simply replace the left node Y to its node to be deleted.

Scenario 3

In this scenario, if we want to delete the node Z which has an empty left subtree, then replace the Z with Y and make sure the left, right and the parent pointers are aligned properly.

Scenario 4

If we are going to replace the node Y which parent is not Z then this scenario kicks in.

In order to achieve these 4 scenarios, we need to write a transplant subroutine.

Here there are 2 nodes to be replaced u and v.

Here in this scenario Node Y -> u and Node X -> V.

Line 2 and 3 if u.parent is empty, here in this condition it is not empty as u.parent is r.

Line 4 and 5 if u is u.parent.left which is true then we assign u.parent.left = v, this gives nodes R.left = X

Line 10 v.parent = u.parent, we already made r.left is x and now we are making x.parent is r

Deleting a node in BST

Lines 2 and 3 represent scenario 1, where we replace the right node to its parent.

Lines 4 and 5 represent scenario 2, where we replace the left node to its parent.

Lines 7 to 18 represent scenarios 3 and 4.

If the right subtree is available, then find the minimum of the right subtree (node Y).

Lines 16–18, we replace Z as a child of its parent by Y and replace Y’s left child by Z’s left child.

Line 10–13 represents the final scenario, replaces Y as a child of its parent by Y’s right child and turns Z’s right child to Y’s right child.

All these operations are performed in O(h) — the height h of the binary tree.

3. Red-Black Tree

In the previous section, we saw the operations of Binary Search Tree which runs on the O(h) time complexity (height of the binary search tree). If the height of the tree is small, then there is no problem, but if the height of the tree is large then the operation will cost more.

A red-black tree is a binary search tree, where the operations will be performed at O(log n) time. Since it gets constrained with the color on any simple path from the root to leaf, it ensures that no path is twice as long as any other path. So it is ‘approximately balanced’

Red Black tree was used for collision avoidance (instead of buckets which is linked list) in HashMap in Java 8.0 onwards. We shall see the collision avoidance (in Hash Tables) in a separate post.

There are 5 properties when it comes to red-black trees.

  1. The node of the red-black tree must be either in RED or BLACK in color.
  2. The root color must be black
  3. Every leaf is black.
  4. If a node is red then both the children must be black.
  5. For each node, all simple paths from the node to its descendant leaves contain an equal number of black nodes.
Red Black Tree with all the 5 properties defined.

if you see in the above diagram, all the five properties have been defined. Here unlike in BST all the leaf children instead of having null reference we have referenced it to NIL object (an object of a Node). By this, it gives better handling of edge cases. All the sentinel (NIL) nodes can be considered as external nodes and all other nodes are considered as internal nodes.

Now let's look into its operations. we shall see the operations that change from BST.

i) Rotations.

while performing insertion or deletion, we may change the property of the red-black tree. By performing the local operations like rotation we can maintain the RB Properties.

Left Rotation

Left Rotation on the node x. Since we are going to modify the pointers of the 2 nodes, we shall not worry about the color.

Left Rotation
Node.java for the red-black tree.
Left Rotation of the red-black tree.

Right Rotation

Similarly, the operation is vice versa of Left Rotation.

Right rotation of red-black tree node

ii) Insertion

Inserting a node is much similar to inserting a node in BST (discussed above) except

  1. We initialize the node to nil instead of null.
  2. Point Z left and right child to nil
  3. Color Z to red.
Inserting a node in the red-black tree

Before we look into the ‘rbInsertFixup’ method, we shall see what violations of the red-black tree we could have brought by inserting a node.

Property 1 will hold as the node is either red or black since we paint the color to red.

Property 3 also holds as we paint the Z’s left and right child to black color.

Property 5 states that from the node to its descendants, the number of black nodes should be equal, and this property is also not violated as we are replacing the sentinel (black) with Z (red) which has sentinel children.

So by inserting a child seems to affect property 2 and 4 which is

Property 2, the root should be black in color and Property 4, if Z’s parent is red.

we now introduce a method, which takes care of the violation of the property 2 and 4.

Transformation of the nodes during insert with rbInsertFixup Method
Color fixup in the Insert method for the red-black tree

Let’s consider that the node to be inserted is with the value 4 as shown in the above diagram. And from the ‘insert’ method we know that we have inserted the node ‘Z’ which has the color red. The whole method runs only when the node ‘Z’s parent’s color is red.

Line 5, is setting the node Y (z.parent.parent.right) which is the uncle of node Z. Line 6 –12 is the condition when the color of the node Y is red. Then we change the color z.parent to black, y.color to black and we change z.parent.parent color to red. The reason for these changes is we know from the properties that a red color node then the children must be black. Since Z is red and its parent is also red we change the color of z.parent, z.parent.parent and y. we change the pointer of z to z.parent.parent.

Line 13 –16 checks if z is in z.parent.right then change the pointer from z to z.parent and do a left rotate on z. These changes are because of nodes with the value 2 and 4 ( z and z.parent) both were red in color.

Line 17–19 changes z.parent.color to black and z.parent.parent.color to red color and perform a right rotate. These changes are again because of nodes with the 2 and 7 (z and z.parent) both were red in color.

Line 23–36, similar to the above implementation but if y was initially in z.parent.parent.right.

Line 41, according to property 2, we change the root color to black. All these changes are happening in O(log n) logarithmic time complexity.

Explanation:

This while loop executes only in case 1 (z and z.parent.color are red). Post that we move the z pointer 2 nodes up hence the running time complexity is O(log n).

iii) Deletion

Like other basic operations, tree delete also takes place in O(log n) time.

Let’s look into the transplant method.

The transplant method is similar to the BST’s transplant method with 2 changes, we compare the u.parent with nil instead of null in BST. And the second one is we assign u.parent to v.parent even if v is a sentinel node.

The delete method is similar to BST except for

  1. We maintain the node y as it is either removed or moved internally within the red-black tree. Line 3, We assign z to node y. When the node z has fewer than 2 children it will be removed. And when z has 2 children, we set y to z’s successor. Like in the BST we move y to z’s original position.
  2. Because y’s color might change, we maintain y’s original color. For lines 6–12, node z has fewer than 2 children so we maintain z’s color as y’s color and if z has 2 children then y’s color will be z’s successor color and z is not equaled to y then y moves to z’s original position. Maintaining the y’s original color is important and will be checked during the end of the procedure. If the y’s original color is black then it would have introduced some violations.
  3. We need to keep track of the node x also as x moves to y’s original position. Line number 7, 10, 16 we set x to point to either y’s only child or if y has no children, the sentinel (nil).
  4. Since x is moved to node y’s original position, the attribute x.parent is always set to y.parent except when z is y’s original parent.
  5. if z is y’s original parent then refer to line 17. x.parent is set as y.
  6. Finally if y.color is black then we might have introduced some violations so in-order to fix it we will call the method ‘rbDeleteFixup’ at the end of the code. If node y is red, then red-black property still holds when y is removed or moved for the following reasons.
  • No black heights in the tree have been changed.
  • Since y could not have been the root, the root color would have remained black.

Color Fix for the Delete method.

color fix after deleting the node in the red-black tree

There are 4 cases to be considered for fixing the color during the delete operation, we shall see in detail.

Case 1 and 2 for the delete fix-up method.

In the above images, Lighter red colors represent either red or black. α, β … ζ are the arbitrary subtrees

Let's look into 2 cases first.

Case 1: Node x’s sibling w is red

Line 8 — 3, interchanges the color of x.parent and w and performs the left rotate on x.parent.

After the left-rotate, assign ‘w’ as x.parent.right. Now from the outcome of the case 1, we get into case 2 or case 3 or case 4.

Case 2: Node x’s sibling w is black and w’s children were also black

This is from the continuation of the case 1 if the new ‘w’ has 2 black children. Since w’s children are black and w is also black, we remove black from either x or w. And we change the node w’s color to red and we leave the x color to black. Since we removed the black from w, we wanted to add a black to x.parent, which was originally red or black. we do it by repeating the while loop with x.parent as x, please refer lines 14–16

Now if we have come from case 1 to case 2, previously the node x.parent was red and when we transform x to x.parent, the new x has become red color which terminates the while loop. Line 61 converts the x.color to black which retains all the red-black tree properties.

Case 3 and 4 for the delete fix-up method.

In the above images, Lighter red colors represent either red or black. c and c’ are considered as red or black. α, β … ζ are the arbitrary subtrees

Case 3: x’s sibling w is black and w’s left child is red and w’s right child is black

If the x’s sibling w is black and its left child is red, then interchange the color w to its left child and perform a right-rotate on w. And re-assign w as x.parent.right. This from case 3 is directed to case 4.

Case 4: x’s sibling w is black and w’s right child is red.

Now by making some color changes, and performing left-rotate on x.parent, we remove the extra black on x and making the node x singly black. Making the x as root will terminate the while loop.

Analysis

The running time complexity of the delete method is O(log n) because of the height of the tree.

The running time complexity of the delete fix-up method also takes O(log n) because case 1, case 3 and case 4 run constantly except for case 2 as it can run in the loop by changing the pointer of x, but the pointer moves up the tree with the time of O(log n). So the running time complexity is O(log n) for delete method.

References

  1. https://en.wikipedia.org/wiki/Binary_tree#Properties_of_binary_trees
  2. https://www.geeksforgeeks.org/binary-tree-set-1-introduction/

Next Part

We shall see AVL Trees, Splay Trees, and Heap in the next part.

--

--

karthik Rajkumar

Technical Architect, Deep Learning enthusiast, a speaker, and a continuous learner.