Learning Tree Data Structure

Trees are so beautiful: my draw when I was a young boy

When we first start learning to code, it’s common to learn arrays as the “main data structure”. Eventually, we learn hash table too. If you are pursuing a Computer Science degree, you have a Data Structure class, and you also learn Linked Lists, Queues, and Stacks. Those data structures are all called “linear” data structures because they all have a logical start and a logical end.

When we start learning trees and graphs, it gets really confused. We don’t store data in a linear way. Both data structures have a specific way to store data.

This post is an attempt to we better understand the Tree Data Structure and clarify any doubts about it. We will learn about what is a tree, examples of it, its terminology, how it works, and a technical implementation (a.k.a code!)

Let’s start this learning journey! :)


Definition

As we said early when we start programming, it is common to understand better the linear data structures than data structures like trees and graphs.

Trees are well known as a non-linear Data Structure. It doesn’t store data in a linear way. It organizes data in a hierarchical way.


Let’s dive in real life examples

What do I mean about hierarchical way? Imagine a family tree with all generation relationship: grandparents, parents, children, siblings, etc. We commonly organize it in a hierarchical way.

My family tree

This is my Tree Family. Tossico, Akikazu, Hitomi and Takemi are all my grandparents. Toshiaki and Juliana are my parents. TK, Yuji, Bruno and Kaio are the children of my parents (Me and my brothers, duh!)

Company hierarchy is another example.

A Simple Company Hierarchy

If you are a web developer, you probably understand how the DOM (Document Object Model) works. It works as a tree.

The html tag has all other tags. So we have a head tag and a body tag. Those tags contains specific elements. The head tag has meta and title tags. And the body tag has all elements that show in the UI like h1, a, li, etc.


A technical definition and its terminology

So a Tree is a collection of entities called node connected by edges. Each node contains a value or data and it can also have a child node (or not).

The first node of the tree is called the root. If this root node is connected by another node, the root is a parent node and the connected node is a child.


Tree nodes are all connected by links called edges. It’s an important part of trees, because it’s how we manage relationship between nodes.


Leafs are the “last nodes" from the tree. Or nodes without children. Like real trees. We have the root, branches, and finally the leafs.


Other important concepts to understand are height and depth.

The height of a tree is the length of the longest path to a leaf.

The depth of a node is the length of the path to its root.


Terminology Summary

  • Root: the topmost node of the tree
  • Edge: the link between 2 nodes
  • Child: a node that has a parent node
  • Parent: a node that has an edge to a child node
  • Leaf: a node that does not have a child node in the tree
  • Height: The height of a tree is the length of the longest path to a leaf.
  • Depth: The depth of a node is the length of the path to its root.

Binary Tree

Now we will discuss a specific type of tree. We call it Binary Tree.

“In computer science, a binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.” — Wikipedia

So let’s see an example of Binary Tree.


Binary Tree: Coding Mode On

The first thing we need to have in mind when implement a Binary Tree is: it is a collection of nodes, and each node has three attributes: value, left_child, and right_child.

How do we implement a simple Binary Tree initializing with these three properties? Let’s see.

So here it is. Our Binary Tree class. When we instantiate an object, we pass the value (the node‘s data) as a parameter. And look at the left_child and the right_child. Both are set to None. Why? Because when we create our node, it doesn’t have any children. We just have the node data.

Testing it:

That’s it. We can pass the stringa“ as the value to our Binary Tree node. If we print the value, left_child, and right_child, we can see the values.

Let’s go to the insertion part! What do we need to do here? We will implement a method to insert a new node to the right and to the left. What’s the rule?

  • If the current node doesn’t have a left child, we just create a new node and set it to the current node’s left_child.
  • If it does have the left child, we create a new node and put it in the current left child‘s place. Allocate this left child node to the new node‘s left child.

Should we draw it? :)

Let’s see the code.

Again, if the current node doesn’t have a left child, we just create a new node and set it to the current node’s left_child. Else we create a new node and put it in the current left child‘s place. Allocate this left child node to the new node‘s left child.

And we do the same thing to to insert a right child node.

Done! :) But not entirely! We need to test it! Let’s build this tree:

  • So here the a node will be the root of our binary Tree.
  • a left child is b node.
  • a right child is c node.
  • b right child is d node (b node doesn’t have a left child).
  • c left child is e node.
  • c right child is f node.
  • Both e and f nodes don’t have children.

So here it is:


Insertion is done! Now we start to think about tree traversal. We have 2 options here: DFS and BFS. WHAT?

  • DFS: Depth-First Search — “is an algorithm for traversing or searching tree data structure. One starts at the root and explores as far as possible along each branch before backtracking.” — Wikipedia
  • BFS: Breadth-First Search — “is an algorithm for traversing or searching tree data structure. It starts at the tree root and explores the neighbor nodes first, before moving to the next level neighbours.” — Wikipedia

So let’s dive into each tree traversal type.

DFS — Depth-First Search

DFS explores a path all the way to a leaf before backtracking and exploring another path. Let’s see an example of this type of traversal.

So the result of this algorithm will be: 1–2–3–4–5–6–7. Why?

  1. We start with the root (1). Print it.
  2. Go to the left child (2). Print it.
  3. Go to the left child (3). Print it. (This node doesn’t have any children)
  4. Backtrack and go the right child (4). Print it. (This node doesn’t have any children)
  5. Backtrack to the root node and go the right child (5). Print it.
  6. Go to the left child (6). Print it. (This node doesn’t have any children)
  7. Backtrack and go the right child (7). Print it. (This node doesn’t have any children)
  8. Done.

This is what we call DFS algorithm. Going deep to the leaf and backtrack.

Now that we are familiar with this traversal algorithm, we will understand 3 types of DFS: pre-order, in-order, and post-order.

  • Pre-order: This is exactly what we did above. Print the current node’s value. Go to the left child and print it. Backtrack. Go to the right child and print it.
  1. Print node‘s value
  2. Go to the left child and print it. If, and only if, it has a left child.
  3. Go to the right child and print it. If, and only if, it has a right child.

  • In-order: We go way down to the left child and print it first. Backtrack and print it. And go way down to the right child and print it.

So the result of the in order algorithm (for this tree example) will be: 3–2–4–1–6–5–7. The left first, the middle second, and the right last.

Coding time!

  1. Go to the left child and print it. If, and only if, it has a left child.
  2. Print node‘s value
  3. Go to the right child and print it. If, and only if, it has a right child.

  • Post-order: We go way down to the left child and print it first. Backtrack. Go way down to the right child. Print it second. Backtrack and print it.

So the result of the post order algorithm (for this tree example) will be: 3–4–2–6–7–5–1. The left first, the right second, and the middle last.

Coding time!

  1. Go to the left child and print it. If, and only if, it has a left child.
  2. Go to the right child and print it. If, and only if, it has a right child.
  3. Print node‘s value

BFS — Breadth-First Search

BFS algorithm traverses the tree level by level. Depth by depth.

Here an example to better understand this algorithm.

So we traverse level by level. In this example, the result is 1–2–5–3–4–6–7.

  • Level/Depth 0: only node with value 1
  • Level/Depth 1: nodes with value 2 and 5
  • Level/Depth 2: nodes with value 3, 4, 6, and 7

Let’s code it!

To implement a BFS algorithm we use the queue data structure as a helper. How does it work?

  1. So first we add the root node into the queue with the put method.
  2. We will iterate while the queue is not empty.
  3. Get the first node in the queue, print its value
  4. Add both left and right children into the queue (if the current node has children).
  5. Done. We will print each node‘s value level by level with our queue helper.

Binary Search Tree

A Binary Search Tree is sometimes called ordered or sorted binary trees and it keeps its values in sorted order, so that lookup and other operations can use the principle of binary search — Wikipedia

One important property of a Binary Search Tree is “A Binary Search Tree node‘s value is larger than the value of any descendant of its left child, and smaller than the value of any descendant of its right child”.

  • A: it is inverted. The subtree 7–5–8–6 needs to be on the right side, and the subtree 2–1–3 needs to be on the left.
  • B: it is the only corrected option. It satisfy the Binary Search Tree property.
  • C: it has just one problem, the node with value 4. It needs to be on the left side of the root, because it is smaller than 5.

Binary Search Tree: Coding Mode On

Now it’s coding time! What will we see here? Insertion, search for a value, deletion, and balance the tree. Let’s the game begin

Insertion: adding new nodes to our tree

Imagine we have an empty tree and we want to start adding new nodes with these values: 50, 76, 21, 4, 32, 100, 64, 52. In that order.

The first thing we need to know is 50 is the root of our tree.

So now we can start inserting node by node.

  • 76: greater than 50, insert on the right side.
  • 21: smaller than 50, insert on the left side.
  • 4: smaller than 50. Node with value 50 has a left child (21). 4 is smaller than 21, insert it on the left side of this node.
  • 32: smaller than 50. Node with value 50 has a left child (21). 32 is greater than 21, insert it on the right side of this node.
  • 100: greater than 50. Node with value 50 has a right child (76). 100 is greater than 76, insert it on the right side of this node.
  • 64: greater than 50. Node with value 50 has a right child (76). 64 is smaller than 76, insert it on the left side of this node.
  • 52: greater than 50. Node with value 50 has a right child (76). 52 is smaller than 76. Node with value 76 has a left child (64). 52 is smaller than 64, insert it on the left side of this node.

Did you notice a pattern here? Let’s break it down.

  1. Ask “Is the new node value greater or smaller than the current node?”
  2. Is the new node value greater than the current node? Go to the right subtree. If the current node doesn’t have a right child, insert it there. Else backtrack to rule number 1.
  3. Is the new node value smaller than the current node? Go to the left subtree. If the current node doesn’t have a left child, insert it there. Else backtrack to rule number 1.
  4. Special case we didn’t handle: when new node value is equal to the current node‘s value. Just use the rule number 3. Consider equal values to insert in the left subtree part.

Now the code! yey!

It seems very simple. The powerful part of this algorithm is the recursion part: line 9 and line 13. Both lines of code call the insert_node method, for its left and right children, respectively. Lines 11 and 15 are the ones that we do the insertion for each child.

Searching: let’s find the node value. Or not

The algorithm that we will build now is all about searching. For a given value (integer number), we will say if our Binary Search Tree has or hasn’t that value.

One important part is how we defined the tree insertion algorithm. First we have our root node. All the left subtree nodes will have smaller values than the root node. And all the right subtree nodes will have greater values than the root node.

Example time! Imagine we have this tree.

Now we want to know if we find a node based on a given value 52.

Let’s break it down.

  1. We start with the root node as our current node. Is the given value smaller than the current node value? If yes, then we will search it on the left subtree.
  2. Is the given value greater than the current node value? If yes, then we will search it on the right subtree.
  3. If rules number 1 and number 2 are both false, we can compare the current node value and the given value if they are equal. If the return of the comparison is true then “Yeah! Our tree has the given value”. Else “Nooo. It hasn’t”.

Coding time!

  • Lines 8 and 9 are the rule number 1.
  • Lines 10 and 11 are the rule number 2.
  • Line 13 is the rule number 3.

How do we test it?

Let’s create our Binary Search Tree initializing the root node with the value 15.

And now we will insert a lot of new nodes there.

For each inserted node we will test if our find_node method really works.

Yeah, it works for these given values! Let’s test for a value that doesn’t exist in our Binary Search Tree.

Oh yeah! Searching… done!

Deletion: removing and organizing

Deletion is a more complex algorithm, because we need to handle different cases. For a given value, we need to remove the node with this value. Imagine that this node has not children, or a single child, or two children.

  • Case 1: A node with no children (leaf node).

If the node we want to delete has no children, we simply delete it. The algorithm doesn’t need to reorganize the tree.

  • Case 2: A node with just one child (left or right child).

To cover the second case, our algorithm needs to make the node’s parent points to the node‘s child. If the node is the left child of its parent, we make the node’s parent left child points to the node‘s child. If the node is the right child of its parent, we make the node’s parent right child points to the node‘s child.

  • Case 3: A node with two children.

When the node has 2 children, we need to find the node with the minimum value starting from the node‘s right child. We will put this node with minimum value in the place of the node we want to remove.

Coding time!

  1. First thing: the parameters. value and parent. We want to find the node that has this value and the node’s parent is a important part to remove the node.
  2. Second thing: the returning value. Our algorithm will return a boolean value. True if it found the node and removed it. Otherwise False.
  3. From line 2 to line 9: we start searching for the node that has the value we are looking for. If the value is smaller than the current node value we go to the left subtree, recursively (if and only if the current node has a left child). If the value is greater, go to the right subtree, recursively.
  4. Line 10: we start to think about the remove algorithm.
  5. From line 11 to line 13: we cover the node with no children and it is the left child from its parent. We remove the node by setting the parent’s left child to None.
  6. Lines 14 and 15: we cover the node with just no children and it is the right child from its parent. We remove the node by setting the parent’s right child to None.
  7. Clear node method: I will show the clear_node code below. But basically it set the node left child, right child, and its value to None.
  8. From line 16 to line 18: we cover the node with just one child (left child) and it is the left child from its parent. We set the parent's left child to the node’s left child (the only child it has).
  9. From line 19 to line 21: we cover the node with just one child (left child) and it is the right child from its parent. We set the parent's right child to the node’s left child (the only child it has).
  10. From line 22 to line 24: we cover the node with just one child (right child) and it is the left child from its parent. We set the parent's left child to the node’s right child (the only child it has).
  11. From line 25 to line 27: we cover the node with just one child (right child) and it is the right child from its parent. We set the parent's right child to the node’s right child (the only child it has).
  12. From line 28 to line 30: we cover the node with both left and right children. We get the node with the smallest value (code below!) and set it to the current node‘s value. Finish it by removing the smallest node.
  13. Line 32: if we found the node we are looking for, we need to return True. From line 11 to line 31 we handle this case. So just return True and that’s it!

The clear_node method: set None value to all three attributes (value, left_child, and right_child)

The find_minimum_value method: go way down to the left. If we can’t anymore, we found the smallest node.

Testing time!

We will use this tree to test our remove_node algorithm:

Let’s remove the node with value 8. It’s a node with no child.

Now let’s remove the node with value 17. It’s a node with just one child.

Finally we will remove a node with two children. The root of our tree.

Tests Done! :)


That’s it!

We learned a lot here. Congrats on finishing this dense piece of content. It’s really tough to understand a new concept we are not used to. But you did it! :)

This is one more step forward in my journey on learning and mastering algorithms and data structures. You can see the documentation of my complete journey here on my Renaissance Developer publication.


Have fun, keep learning, and always coding.

My Instagram, Twitter, Github & LinkedIn. ☺