Learning Tree Data Structure
This article was originally published at iamtk.co.
When we first start learning to code, it’s common to learn arrays as the “main data structure”. Eventually, we learn hash table
too. If you are pursuing a Computer Science degree, you have a Data Structure class, and you also learn Linked Lists
, Queues
, and Stacks
. Those data structures are all called “linear” data structures because they all have a logical start and a logical end.
When we start learning trees
and graphs
, it gets really confusing. We don’t store data in a linear way. Both data structures have a specific way to store data.
This post is an attempt to we better understand the Tree
Data Structure and clarify any doubts about it. We will learn about what is a tree
, examples of it, its terminology, how it works, and a technical implementation (a.k.a code!
)
Let’s start this learning journey! :)
Definition
As we said early when we start programming, it is common to understand better the linear data structures than data structures like trees
and graphs
.
Trees
are well known as a non-linear Data Structure. It doesn’t store data in a linear way. It organizes data in a hierarchical way.
Let’s dive in real life examples
What do I mean about hierarchical way? Imagine a family tree with all generation relationship: grandparents, parents, children, siblings, etc. We commonly organize it in a hierarchical way.
This is my Tree Family
. Tossico
, Akikazu
, Hitomi
and Takemi
are all my grandparents. Toshiaki
and Juliana
are my parents. TK
, Yuji
, Bruno
and Kaio
are the children of my parents (Me and my brothers, duh!)
Company hierarchy is another example.
If you are a web developer, you probably understand how the DOM (Document Object Model)
works. It works as a tree
.
The html
tag has all other tags. So we have a head
tag and a body
tag. Those tags contains specific elements. The head
tag has meta
and title
tags. And the body
tag has all elements that show in the UI like h1
, a
, li
, etc.
A technical definition and its terminology
So a Tree
is a collection of entities called node
connected by edges
. Each node
contains a value
or data
and it can also have a child node
(or not).
The first node
of the tree
is called the root
. If this root node
is connected by another node
, the root
is a parent node
and the connected node
is a child
.
Tree nodes
are all connected by links called edges
. It’s an important part of trees
, because it’s how we manage relationship between nodes
.
Leafs
are the “last nodes
" from the tree
. Or nodes
without children. Like real trees. We have the root
, branches
, and finally the leafs
.
Other important concepts to understand are height
and depth
.
The height
of a tree
is the length of the longest path to a leaf
.
The depth
of a node
is the length of the path to its root
.
Terminology Summary
- Root: the topmost
node
of thetree
- Edge: the link between 2
nodes
- Child: a
node
that has aparent node
- Parent: a
node
that has anedge
to achild node
- Leaf: a
node
that does not have achild node
in thetree
- Height: The
height
of atree
is the length of the longest path to aleaf
. - Depth: The
depth
of anode
is the length of the path to itsroot
.
Binary Tree
Now we will discuss a specific type of tree
. We call it Binary Tree
.
“In computer science, a binary tree is a tree data structure in which each node has at most two children, which are referred to as the
left child
and theright child
.” — Wikipedia
So let’s see an example of Binary Tree
.
Binary Tree: Coding Mode On
The first thing we need to have in mind when implement a Binary Tree
is: it is a collection of nodes
, and each node
has three attributes: value
, left_child
, and right_child
.
How do we implement a simple Binary Tree
initializing with these three properties? Let’s see.
So here it is. Our Binary Tree
class. When we instantiate an object, we pass the value
(the node
‘s data
) as a parameter. And look at the left_child
and the right_child
. Both are set to None
. Why? Because when we create our node
, it doesn’t have any children. We just have the node data
.
Testing it:
That’s it. We can pass the string
“a
“ as the value
to our Binary Tree node
. If we print the value
, left_child
, and right_child
, we can see the values.
Let’s go to the insertion part! What do we need to do here? We will implement a method to insert a new node
to the right
and to the left
. What’s the rule?
- If the current
node
doesn’t have aleft child
, we just create a newnode
and set it to the currentnode
’sleft_child
. - If it does have the
left child
, we create a newnode
and put it in the currentleft child
‘s place. Allocate thisleft child node
to the newnode
‘sleft child
.
Should we draw it? :)
Let’s see the code.
Again, if the current node
doesn’t have a left child
, we just create a new node
and set it to the current node
’s left_child
. Else we create a new node
and put it in the current left child
‘s place. Allocate this left child node
to the new node
‘s left child
.
And we do the same thing to to insert a right child node
.
Done! :) But not entirely! We need to test it! Let’s build this tree
:
- So here the
a
node
will be theroot
of ourbinary Tree
. a
left child
isb
node
.a
right child
isc
node
.b
right child
isd
node
(b
node
doesn’t have aleft child
).c
left child
ise
node
.c
right child
isf
node
.- Both
e
andf
nodes
don’t have children.
So here it is:
Insertion is done! Now we start to think about tree
traversal. We have 2 options here: DFS and BFS. WHAT?
- DFS: Depth-First Search — “is an algorithm for traversing or searching tree data structure. One starts at the root and explores as far as possible along each branch before backtracking.” — Wikipedia
- BFS: Breadth-First Search — “is an algorithm for traversing or searching tree data structure. It starts at the tree root and explores the neighbor nodes first, before moving to the next level neighbours.” — Wikipedia
So let’s dive into each tree
traversal type.
DFS — Depth-First Search
DFS explores a path all the way to a leaf before backtracking and exploring another path. Let’s see an example of this type of traversal.
So the result of this algorithm will be: 1–2–3–4–5–6–7. Why?
- We start with the
root
(1). Print it. - Go to the
left child
(2). Print it. - Go to the
left child
(3). Print it. (Thisnode
doesn’t have any children) - Backtrack and go the
right child
(4). Print it. (Thisnode
doesn’t have any children) - Backtrack to the
root
node
and go theright child
(5). Print it. - Go to the
left child
(6). Print it. (Thisnode
doesn’t have any children) - Backtrack and go the
right child
(7). Print it. (Thisnode
doesn’t have any children) - Done.
This is what we call DFS algorithm. Going deep to the leaf and backtrack.
Now that we are familiar with this traversal algorithm, we will understand 3 types of DFS: pre-order
, in-order
, and post-order
.
- Pre-order: This is exactly what we did above. Print the current
node
’s value. Go to theleft child
and print it. Backtrack. Go to theright child
and print it.
- Print
node
‘s value - Go to the
left child
and print it. If, and only if, it has aleft child
. - Go to the
right child
and print it. If, and only if, it has aright child
.
- In-order: We go way down to the
left child
and print it first. Backtrack and print it. And go way down to theright child
and print it.
So the result of the in order
algorithm (for this tree
example) will be: 3–2–4–1–6–5–7. The left first, the middle second, and the right last.
Coding time!
- Go to the
left child
and print it. If, and only if, it has aleft child
. - Print
node
‘s value - Go to the
right child
and print it. If, and only if, it has aright child
.
- Post-order: We go way down to the
left child
and print it first. Backtrack. Go way down to theright child
. Print it second. Backtrack and print it.
So the result of the post order
algorithm (for this tree
example) will be: 3–4–2–6–7–5–1. The left first, the right second, and the middle last.
Coding time!
- Go to the
left child
and print it. If, and only if, it has aleft child
. - Go to the
right child
and print it. If, and only if, it has aright child
. - Print
node
‘s value
BFS — Breadth-First Search
BFS algorithm traverses the tree
level by level. Depth by depth.
Here an example to better understand this algorithm.
So we traverse level by level. In this example, the result is 1–2–5–3–4–6–7.
- Level/Depth 0: only
node
with value 1 - Level/Depth 1:
nodes
with value 2 and 5 - Level/Depth 2:
nodes
with value 3, 4, 6, and 7
Let’s code it!
To implement a BFS algorithm we use the queue
data structure as a helper. How does it work?
- So first we add the
root
node
into thequeue
with theput
method. - We will iterate while the
queue
is not empty. - Get the first
node
in thequeue
, print its value - Add both
left
andright
children
into thequeue
(if the currentnode
haschildren
). - Done. We will print each
node
‘s value level by level with ourqueue
helper.
Binary Search Tree
A Binary Search Tree is sometimes called ordered or sorted binary trees and it keeps its values in sorted order, so that lookup and other operations can use the principle of binary search — Wikipedia
One important property of a Binary Search Tree
is “A Binary Search Tree
node
‘s value is larger than the value of any descendant of its left child
, and smaller than the value of any descendant of its right child
”.
- A: it is inverted. The
subtree
7–5–8–6 needs to be on the right side, and thesubtree
2–1–3 needs to be on the left. - B: it is the only corrected option. It satisfy the
Binary Search Tree
property. - C: it has just one problem, the
node
with value 4. It needs to be on the left side of theroot
, because it is smaller than 5.
Binary Search Tree: Coding Mode On
Now it’s coding time! What will we see here? Insertion, search for a value, deletion, and balance the tree
. Let’s the game begin
Insertion: adding new nodes to our tree
Imagine we have an empty tree
and we want to start adding new nodes
with these values: 50, 76, 21, 4, 32, 100, 64, 52. In that order.
The first thing we need to know is 50 is the root
of our tree
.
So now we can start inserting node
by node
.
- 76: greater than 50, insert on the right side.
- 21: smaller than 50, insert on the left side.
- 4: smaller than 50.
Node
with value 50 has aleft child
(21). 4 is smaller than 21, insert it on the left side of thisnode
. - 32: smaller than 50.
Node
with value 50 has aleft child
(21). 32 is greater than 21, insert it on the right side of thisnode
. - 100: greater than 50.
Node
with value 50 has aright child
(76). 100 is greater than 76, insert it on the right side of thisnode
. - 64: greater than 50.
Node
with value 50 has aright child
(76). 64 is smaller than 76, insert it on the left side of thisnode
. - 52: greater than 50.
Node
with value 50 has aright child
(76). 52 is smaller than 76.Node
with value 76 has aleft child
(64). 52 is smaller than 64, insert it on the left side of thisnode
.
Did you notice a pattern here? Let’s break it down.
- Ask “Is the new
node
value greater or smaller than the currentnode
?” - Is the new
node
value greater than the currentnode
? Go to the rightsubtree
. If the currentnode
doesn’t have aright child
, insert it there. Else backtrack to rule number 1. - Is the new
node
value smaller than the currentnode
? Go to the leftsubtree
. If the currentnode
doesn’t have aleft child
, insert it there. Else backtrack to rule number 1. - Special case we didn’t handle: when new
node
value is equal to the currentnode
‘s value. Just use the rule number 3. Consider equal values to insert in the leftsubtree
part.
Now the code! yey!
It seems very simple. The powerful part of this algorithm is the recursion part: line 9
and line 13
. Both lines of code call the insert_node
method, for its left
and right
children
, respectively. Lines 11
and 15
are the ones that we do the insertion for each child
.
Searching: let’s find the node value. Or not
The algorithm that we will build now is all about searching. For a given value (integer number), we will say if our Binary Search Tree
has or hasn’t that value.
One important part is how we defined the tree
insertion algorithm. First we have our root
node
. All the left subtree
nodes
will have smaller values than the root
node
. And all the right subtree
nodes
will have greater values than the root
node
.
Example time! Imagine we have this tree
.
Now we want to know if we find a node based on a given value 52.
Let’s break it down.
- We start with the
root
node
as our currentnode
. Is the given value smaller than the currentnode
value? If yes, then we will search it on the leftsubtree
. - Is the given value greater than the current
node
value? If yes, then we will search it on the rightsubtree
. - If rules number 1 and number 2 are both false, we can compare the current
node
value and the given value if they are equal. If the return of the comparison istrue
then “Yeah! Ourtree
has the given value”. Else “Nooo. It hasn’t”.
Coding time!
- Lines 8 and 9 are the rule number 1.
- Lines 10 and 11 are the rule number 2.
- Line 13 is the rule number 3.
How do we test it?
Let’s create our Binary Search Tree
initializing the root
node
with the value 15.
And now we will insert a lot of new nodes
there.
For each inserted node
we will test if our find_node
method really works.
Yeah, it works for these given values! Let’s test for a value that doesn’t exist in our Binary Search Tree
.
Oh yeah! Searching… done!
Deletion: removing and organizing
Deletion is a more complex algorithm, because we need to handle different cases. For a given value, we need to remove the node
with this value. Imagine that this node
has not children
, or a single child
, or two children
.
- Case 1: A
node
with nochildren
(leaf
node
).
If the node
we want to delete has no children, we simply delete it. The algorithm doesn’t need to reorganize the tree
.
- Case 2: A
node
with just one child (left
orright
child).
To cover the second case, our algorithm needs to make the node
’s parent points to the node
‘s child
. If the node
is the left child
of its parent, we make the node
’s parent left child
points to the node
‘s child
. If the node
is the right child
of its parent, we make the node
’s parent right child
points to the node
‘s child
.
- Case 3: A
node
with two children.
When the node
has 2 children, we need to find the node
with the minimum value starting from the node
‘s right child
. We will put this node
with minimum value in the place of the node
we want to remove.
Coding time!
- First thing: the parameters.
value
andparent
. We want to find thenode
that has thisvalue
and thenode
’s parent is a important part to remove thenode
. - Second thing: the returning value. Our algorithm will return a boolean value.
True
if it found thenode
and removed it. OtherwiseFalse
. - From line 2 to line 9: we start searching for the
node
that has thevalue
we are looking for. If thevalue
is smaller than thecurrent node
value
we go to theleft subtree
, recursively (if and only if thecurrent node
has aleft child
). If thevalue
is greater, go to theright subtree
, recursively. - Line 10: we start to think about the
remove
algorithm. - From line 11 to line 13: we cover the
node
with nochildren
and it is theleft child
from itsparent
. We remove thenode
by setting theparent
’sleft child
toNone
. - Lines 14 and 15: we cover the
node
with just nochildren
and it is theright child
from itsparent
. We remove thenode
by setting theparent
’sright child
toNone
. - Clear node method: I will show the
clear_node
code below. But basically it set thenode
left child
,right child
, and itsvalue
toNone
. - From line 16 to line 18: we cover the
node
with just onechild
(left child
) and it is theleft child
from itsparent
. We set theparent
'sleft child
to thenode
’sleft child
(the only child it has). - From line 19 to line 21: we cover the
node
with just onechild
(left child
) and it is theright child
from itsparent
. We set theparent
'sright child
to thenode
’sleft child
(the only child it has). - From line 22 to line 24: we cover the
node
with just onechild
(right child
) and it is theleft child
from itsparent
. We set theparent
'sleft child
to thenode
’sright child
(the only child it has). - From line 25 to line 27: we cover the
node
with just onechild
(right child
) and it is theright child
from itsparent
. We set theparent
'sright child
to thenode
’sright child
(the only child it has). - From line 28 to line 30: we cover the
node
with bothleft
andright
children. We get thenode
with the smallestvalue
(code below!) and set it to thecurrent node
‘svalue
. Finish it by removing the smallestnode
. - Line 32: if we found the
node
we are looking for, we need to returnTrue
. From line 11 to line 31 we handle this case. So just returnTrue
and that’s it!
The clear_node
method: set None
value to all three attributes (value
, left_child
, and right_child
)
The find_minimum_value
method: go way down to the left. If we can’t anymore, we found the smallest node
.
Testing time!
We will use this tree
to test our remove_node
algorithm:
Let’s remove the node
with value
8. It’s a node
with no child.
Now let’s remove the node
with value
17. It’s a node
with just one child.
Finally we will remove a node
with two children. The root
of our tree
.
Tests Done! :)
That’s it!
We learned a lot here. Congrats on finishing this dense piece of content. It’s really tough to understand a new concept we are not used to. But you did it! :)
This is one more step forward in my journey on learning and mastering algorithms and data structures. You can see the documentation of my complete journey here on my Renaissance Developer publication.
Have fun, keep learning, and always coding.
Resources
- Introduction to Tree Data Structure by mycodeschool
- Tree by Wikipedia
- How To Not Be Stumped By Trees by the talented Vaidehi Joshi
- Intro to Trees Lecture by Professor Jonathan Cohen
- Intro to Trees Lecture by Professor David Schmidt
- Intro to Trees Lecture by Professor Victor Adamchik
- Trees with Gayle Laakmann McDowell
- Binary Tree implementation and tests by TK
- Coursera Course: Data Structures by University of California, San Diego
- Coursera Course: Data Structures and Performance by University of California, San Diego
- Binary Search Tree concepts and implementation by Paul Programming
- Binary Search Tree implementation and tests by TK
- Tree Traversal by Wikipedia
- Binary Search Tree remove node algorithm by GeeksforGeeks
- Binary Search Tree remove node algorithm by Algolist
- Learning Python From Zero to Hero