# Learning Tree Data Structure

This article was originally published at iamtk.co.

When we first start learning to code, it’s common to learn arrays as the “main data structure”. Eventually, we learn `hash table`

too. If you are pursuing a Computer Science degree, you have a Data Structure class, and you also learn `Linked Lists`

, `Queues`

, and `Stacks`

. Those data structures are all called “linear” data structures because they all have a logical start and a logical end.

When we start learning `trees`

and `graphs`

, it gets really confusing. We don’t store data in a linear way. Both data structures have a specific way to store data.

This post is an attempt to we better understand the `Tree`

Data Structure and clarify any doubts about it. We will learn about what is a `tree`

, examples of it, its terminology, how it works, and a technical implementation (a.k.a `code!`

)

Let’s start this learning journey! :)

# Definition

As we said early when we start programming, it is common to understand better the linear data structures than data structures like `trees`

and `graphs`

.

`Trees`

are well known as a non-linear Data Structure. It doesn’t store data in a linear way. It organizes data in a hierarchical way.

# Let’s dive in real life examples

What do I mean about hierarchical way? Imagine a family tree with all generation relationship: grandparents, parents, children, siblings, etc. We commonly organize it in a hierarchical way.

This is my `Tree Family`

. `Tossico`

, `Akikazu`

, `Hitomi`

and `Takemi`

are all my grandparents. `Toshiaki`

and `Juliana`

are my parents. `TK`

, `Yuji`

, `Bruno`

and `Kaio`

are the children of my parents (Me and my brothers, duh!)

Company hierarchy is another example.

If you are a web developer, you probably understand how the `DOM (Document Object Model)`

works. It works as a `tree`

.

The `html`

tag has all other tags. So we have a `head`

tag and a `body`

tag. Those tags contains specific elements. The `head`

tag has `meta`

and `title`

tags. And the `body`

tag has all elements that show in the UI like `h1`

, `a`

, `li`

, etc.

# A technical definition and its terminology

So a `Tree`

is a collection of entities called `node`

connected by `edges`

. Each `node`

contains a `value`

or `data`

and it can also have a `child node`

(or not).

The `first node`

of the `tree`

is called the `root`

. If this `root node`

is connected by another `node`

, the `root`

is a `parent node`

and the connected `node`

is a `child`

.

`Tree nodes`

are all connected by links called `edges`

. It’s an important part of `trees`

, because it’s how we manage relationship between `nodes`

.

`Leafs`

are the “last `nodes`

" from the `tree`

. Or `nodes`

without children. Like real trees. We have the `root`

, `branches`

, and finally the `leafs`

.

Other important concepts to understand are `height`

and `depth`

.

The` height`

of a `tree`

is the length of the longest path to a `leaf`

.

The `depth`

of a `node`

is the length of the path to its `root`

.

# Terminology Summary

**Root**: the topmost`node`

of the`tree`

**Edge**: the link between 2`nodes`

**Child**: a`node`

that has a`parent node`

**Parent**: a`node`

that has an`edge`

to a`child node`

**Leaf**: a`node`

that does not have a`child node`

in the`tree`

**Height**: The`height`

of a`tree`

is the length of the longest path to a`leaf`

.**Depth**: The`depth`

of a`node`

is the length of the path to its`root`

.

# Binary Tree

Now we will discuss a specific type of `tree`

. We call it `Binary Tree`

.

“In computer science, a

binary treeis a tree data structure in which each node has at most two children, which are referred to as the`left child`

and the`right child`

.” —Wikipedia

So let’s see an example of `Binary Tree`

.

# Binary Tree: Coding Mode On

The first thing we need to have in mind when implement a `Binary Tree`

is: it is a collection of `nodes`

, and each `node`

has three attributes: `value`

, `left_child`

, and `right_child`

.

How do we implement a simple `Binary Tree`

initializing with these three properties? Let’s see.

So here it is. Our `Binary Tree`

class. When we instantiate an object, we pass the `value`

(the `node`

‘s `data`

) as a parameter. And look at the `left_child`

and the `right_child`

. Both are set to `None`

. Why? Because when we create our `node`

, it doesn’t have any children. We just have the `node data`

.

Testing it:

That’s it. We can pass the `string`

“`a`

“ as the `value`

to our `Binary Tree node`

. If we print the `value`

, `left_child`

, and `right_child`

, we can see the values.

Let’s go to the insertion part! What do we need to do here? We will implement a method to insert a new `node`

to the `right`

and to the `left`

. What’s the rule?

- If the current
`node`

doesn’t have a`left child`

, we just create a new`node`

and set it to the current`node`

’s`left_child`

. - If it does have the
`left child`

, we create a new`node`

and put it in the current`left child`

‘s place. Allocate this`left child node`

to the new`node`

‘s`left child`

.

Should we draw it? :)

Let’s see the code.

Again, if the current `node`

doesn’t have a `left child`

, we just create a new `node`

and set it to the current `node`

’s `left_child`

. Else we create a new `node`

and put it in the current `left child`

‘s place. Allocate this `left child node`

to the new `node`

‘s `left child`

.

And we do the same thing to to insert a `right child node`

.

Done! :) But not entirely! We need to test it! Let’s build this `tree`

:

- So here the
`a`

`node`

will be the`root`

of our`binary Tree`

. `a`

`left child`

is`b`

`node`

.`a`

`right child`

is`c`

`node`

.`b`

`right child`

is`d`

`node`

(`b`

`node`

doesn’t have a`left child`

).`c`

`left child`

is`e`

`node`

.`c`

`right child`

is`f`

`node`

.- Both
`e`

and`f`

`nodes`

don’t have children.

So here it is:

Insertion is done! Now we start to think about `tree`

traversal. We have **2 options** here: **DFS** and **BFS**. WHAT?

**DFS**: Depth-First Search —*“is an algorithm for traversing or searching tree data structure. One starts at the root and explores as far as possible along each branch before backtracking.” —**Wikipedia***BFS**: Breadth-First Search —*“is an algorithm for traversing or searching tree data structure. It starts at the tree root and explores the neighbor nodes first, before moving to the next level neighbours.” —**Wikipedia*

So let’s dive into each `tree`

traversal type.

## DFS — Depth-First Search

**DFS** explores a path all the way to a leaf before **backtracking** and exploring another path. Let’s see an example of this type of traversal.

So the result of this algorithm will be: 1–2–3–4–5–6–7. Why?

- We start with the
`root`

(1). Print it. - Go to the
`left child`

(2). Print it. - Go to the
`left child`

(3). Print it. (This`node`

doesn’t have any children) - Backtrack and go the
`right child`

(4). Print it. (This`node`

doesn’t have any children) - Backtrack to the
`root`

`node`

and go the`right child`

(5). Print it. - Go to the
`left child`

(6). Print it. (This`node`

doesn’t have any children) - Backtrack and go the
`right child`

(7). Print it. (This`node`

doesn’t have any children) - Done.

This is what we call **DFS** algorithm. Going deep to the leaf and backtrack.

Now that we are familiar with this traversal algorithm, we will understand 3 types of **DFS**: `pre-order`

, `in-order`

, and `post-order`

.

**Pre-order**: This is exactly what we did above. Print the current`node`

’s value. Go to the`left child`

and print it. Backtrack. Go to the`right child`

and print it.

- Print
`node`

‘s value - Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Go to the
`right child`

and print it. If, and only if, it has a`right child`

.

**In-order**: We go way down to the`left child`

and print it first. Backtrack and print it. And go way down to the`right child`

and print it.

So the result of the `in order`

algorithm (for this `tree`

example) will be: 3–2–4–1–6–5–7. The left first, the middle second, and the right last.

Coding time!

- Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Print
`node`

‘s value - Go to the
`right child`

and print it. If, and only if, it has a`right child`

.

**Post-order**: We go way down to the`left child`

and print it first. Backtrack. Go way down to the`right child`

. Print it second. Backtrack and print it.

So the result of the `post order`

algorithm (for this `tree`

example) will be: 3–4–2–6–7–5–1. The left first, the right second, and the middle last.

Coding time!

- Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Go to the
`right child`

and print it. If, and only if, it has a`right child`

. - Print
`node`

‘s value

## BFS — Breadth-First Search

**BFS** algorithm traverses the `tree`

level by level. Depth by depth.

Here an example to better understand this algorithm.

So we traverse level by level. In this example, the result is 1–2–5–3–4–6–7.

- Level/Depth 0: only
`node`

with value 1 - Level/Depth 1:
`nodes`

with value 2 and 5 - Level/Depth 2:
`nodes`

with value 3, 4, 6, and 7

Let’s code it!

To implement a **BFS** algorithm we use the `queue`

data structure as a helper. How does it work?

- So first we add the
`root`

`node`

into the`queue`

with the`put`

method. - We will iterate while the
`queue`

is not empty. - Get the first
`node`

in the`queue`

, print its value - Add both
`left`

and`right`

`children`

into the`queue`

(if the current`node`

has`children`

). - Done. We will print each
`node`

‘s value level by level with our`queue`

helper.

# Binary Search Tree

A Binary Search Tree is sometimes called

orderedorsorted binary treesand it keeps its values in sorted order, so that lookup and other operations can use the principle of binary search — Wikipedia

One important property of a `Binary Search Tree`

is *“A **Binary Search Tree** **node**‘s value is larger than the value of any descendant of its **left child**, and smaller than the value of any descendant of its **right child**”.*

**A**: it is inverted. The`subtree`

7–5–8–6 needs to be on the right side, and the`subtree`

2–1–3 needs to be on the left.**B**: it is the only corrected option. It satisfy the`Binary Search Tree`

property.**C**: it has just one problem, the`node`

with value 4. It needs to be on the left side of the`root`

, because it is smaller than 5.

# Binary Search Tree: Coding Mode On

Now it’s coding time! What will we see here? Insertion, search for a value, deletion, and balance the `tree`

. Let’s the game begin

## Insertion: adding new nodes to our tree

Imagine we have an empty `tree`

and we want to start adding new `nodes`

with these values: 50, 76, 21, 4, 32, 100, 64, 52. In that order.

The first thing we need to know is 50 is the `root`

of our `tree`

.

So now we can start inserting `node`

by `node`

.

- 76: greater than 50, insert on the right side.
- 21: smaller than 50, insert on the left side.
- 4: smaller than 50.
`Node`

with value 50 has a`left child`

(21). 4 is smaller than 21, insert it on the left side of this`node`

. - 32: smaller than 50.
`Node`

with value 50 has a`left child`

(21). 32 is greater than 21, insert it on the right side of this`node`

. - 100: greater than 50.
`Node`

with value 50 has a`right child`

(76). 100 is greater than 76, insert it on the right side of this`node`

. - 64: greater than 50.
`Node`

with value 50 has a`right child`

(76). 64 is smaller than 76, insert it on the left side of this`node`

. - 52: greater than 50.
`Node`

with value 50 has a`right child`

(76). 52 is smaller than 76.`Node`

with value 76 has a`left child`

(64). 52 is smaller than 64, insert it on the left side of this`node`

.

Did you notice a pattern here? Let’s break it down.

- Ask “Is the new
`node`

value greater or smaller than the current`node`

?” - Is the new
`node`

value greater than the current`node`

? Go to the right`subtree`

. If the current`node`

doesn’t have a`right child`

, insert it there. Else backtrack to rule number 1. - Is the new
`node`

value smaller than the current`node`

? Go to the left`subtree`

. If the current`node`

doesn’t have a`left child`

, insert it there. Else backtrack to rule number 1. - Special case we didn’t handle: when new
`node`

value is equal to the current`node`

‘s value. Just use the rule number 3. Consider equal values to insert in the left`subtree`

part.

Now the code! yey!

It seems very simple. The powerful part of this algorithm is the recursion part: `line 9`

and `line 13`

. Both lines of code call the `insert_node`

method, for its `left`

and `right`

`children`

, respectively. Lines `11`

and `15`

are the ones that we do the insertion for each `child`

.

## Searching: let’s find the node value. Or not

The algorithm that we will build now is all about searching. For a given value (integer number), we will say if our `Binary Search Tree`

has or hasn’t that value.

One important part is how we defined the `tree`

**insertion algorithm**. First we have our `root`

`node`

. All the left `subtree`

`nodes `

will have smaller values than the `root`

`node`

. And all the right `subtree`

`nodes `

will have greater values than the `root`

`node`

.

Example time! Imagine we have this `tree`

.

Now we want to know if we find a node based on a given value 52.

Let’s break it down.

- We start with the
`root`

`node`

as our current`node`

. Is the given value smaller than the current`node`

value? If yes, then we will search it on the left`subtree`

. - Is the given value greater than the current
`node`

value? If yes, then we will search it on the right`subtree`

. - If rules number 1 and number 2 are both false, we can compare the current
`node`

value and the given value if they are equal. If the return of the comparison is`true`

then “Yeah! Our`tree`

has the given value”. Else “Nooo. It hasn’t”.

Coding time!

- Lines 8 and 9 are the rule number 1.
- Lines 10 and 11 are the rule number 2.
- Line 13 is the rule number 3.

How do we test it?

Let’s create our `Binary Search Tree`

initializing the `root`

`node`

with the value 15.

And now we will insert a lot of new `nodes`

there.

For each inserted `node`

we will test if our `find_node`

method really works.

Yeah, it works for these given values! Let’s test for a value that doesn’t exist in our `Binary Search Tree`

.

Oh yeah! Searching… done!

## Deletion: removing and organizing

Deletion is a more complex algorithm, because we need to handle different cases. For a given value, we need to remove the `node`

with this value. Imagine that this `node`

has not `children`

, or a single `child`

, or two `children`

.

**Case 1**: A`node`

with no`children`

(`leaf`

`node`

).

If the `node`

we want to delete has no children, we simply delete it. The algorithm doesn’t need to reorganize the `tree`

.

**Case 2**: A`node`

with just one child (`left`

or`right`

child).

To cover the second case, our algorithm needs to make the `node`

’s parent points to the `node`

‘s `child`

. If the `node`

is the `left child`

of its parent, we make the `node`

’s parent `left child`

points to the `node`

‘s `child`

. If the `node`

is the `right child`

of its parent, we make the `node`

’s parent `right child`

points to the `node`

‘s `child`

.

**Case 3**: A`node`

with two children.

When the `node`

has 2 children, we need to find the `node`

with the minimum value starting from the `node`

‘s `right child`

. We will put this `node`

with minimum value in the place of the `node`

we want to remove.

Coding time!

: the parameters.*First thing*`value`

and`parent`

. We want to find the`node`

that has this`value`

and the`node`

’s parent is a important part to remove the`node`

.: the returning value. Our algorithm will return a boolean value.*Second thing*`True`

if it found the`node`

and removed it. Otherwise`False`

.: we start searching for the*From line 2 to line 9*`node`

that has the`value`

we are looking for. If the`value`

is smaller than the`current node`

`value`

we go to the`left subtree`

, recursively (if and only if the`current node`

has a`left child`

). If the`value`

is greater, go to the`right subtree`

, recursively.: we start to think about the*Line 10*`remove`

algorithm.: we cover the*From line 11 to line 13*`node`

with no`children`

and it is the`left child`

from its`parent`

. We remove the`node`

by setting the`parent`

’s`left child`

to`None`

.: we cover the*Lines 14 and 15*`node`

with just no`children`

and it is the`right child`

from its`parent`

. We remove the`node`

by setting the`parent`

’s`right child`

to`None`

.: I will show the*Clear node method*`clear_node`

code below. But basically it set the`node`

`left child`

,`right child`

, and its`value`

to`None`

.: we cover the*From line 16 to line 18*`node`

with just one`child`

(`left child`

) and it is the`left child`

from its`parent`

. We set the`parent`

's`left child`

to the`node`

’s`left child`

(the only child it has).: we cover the*From line 19 to line 21*`node`

with just one`child`

(`left child`

) and it is the`right child`

from its`parent`

. We set the`parent`

's`right child`

to the`node`

’s`left child`

(the only child it has).: we cover the*From line 22 to line 24*`node`

with just one`child`

(`right child`

) and it is the`left child`

from its`parent`

. We set the`parent`

's`left child`

to the`node`

’s`right child`

(the only child it has).: we cover the*From line 25 to line 27*`node`

with just one`child`

(`right child`

) and it is the`right child`

from its`parent`

. We set the`parent`

's`right child`

to the`node`

’s`right child`

(the only child it has).: we cover the*From line 28 to line 30*`node`

with both`left`

and`right`

children. We get the`node`

with the smallest`value`

(code below!) and set it to the`current node`

‘s`value`

. Finish it by removing the smallest`node`

.: if we found the*Line 32*`node`

we are looking for, we need to return`True`

. From line 11 to line 31 we handle this case. So just return`True`

and that’s it!

The `clear_node`

method: set `None`

value to all three attributes (`value`

, `left_child`

, and `right_child`

)

The `find_minimum_value`

method: go way down to the left. If we can’t anymore, we found the smallest `node`

.

Testing time!

We will use this `tree`

to test our `remove_node`

algorithm:

Let’s remove the `node`

with `value`

8. It’s a `node`

with no child.

Now let’s remove the `node`

with `value`

17. It’s a `node`

with just one child.

Finally we will remove a `node`

with two children. The `root`

of our `tree`

.

Tests Done! :)

# That’s it!

We learned a lot here. Congrats on finishing this dense piece of content. It’s really tough to understand a new concept we are not used to. But you did it! :)

This is one more step forward in my journey on learning and mastering algorithms and data structures. You can see the documentation of my complete journey here on my **Renaissance Developer publication**.

Have fun, keep learning, and always coding.

# Resources

- Introduction to Tree Data Structure by
**mycodeschool** - Tree by
**Wikipedia** - How To Not Be Stumped By Trees by
**the talented****Vaidehi Joshi** - Intro to Trees Lecture by Professor
**Jonathan Cohen** - Intro to Trees Lecture by Professor
**David Schmidt** - Intro to Trees Lecture by Professor
**Victor Adamchik** - Trees with
**Gayle Laakmann McDowell** - Binary Tree implementation and tests by
**TK** - Coursera Course: Data Structures by
**University of California, San Diego** - Coursera Course: Data Structures and Performance by
**University of California, San Diego** - Binary Search Tree concepts and implementation by
**Paul Programming** - Binary Search Tree implementation and tests by
**TK** - Tree Traversal by
**Wikipedia** - Binary Search Tree remove node algorithm by
**GeeksforGeeks** - Binary Search Tree remove node algorithm by
**Algolist** - Learning Python From Zero to Hero