# Learning Tree Data Structure

When we first start learning to code, it’s common to learn arrays as the “main data structure”. Eventually, we learn `hash table`

too. If you are pursuing a Computer Science degree, you have a Data Structure class, and you also learn `Linked Lists`

, `Queues`

, and `Stacks`

. Those data structures are all called “linear” data structures because they all have a logical start and a logical end.

When we start learning `trees`

and `graphs`

, it gets really confusing. We don’t store data in a linear way. Both data structures have a specific way to store data.

This post is an attempt to we better understand the `Tree`

Data Structure and clarify any doubts about it. We will learn about what is a `tree`

, examples of it, its terminology, how it works, and a technical implementation (a.k.a `code!`

)

Let’s start this learning journey! :)

# Definition

As we said early when we start programming, it is common to understand better the linear data structures than data structures like `trees`

and `graphs`

.

`Trees`

are well known as a non-linear Data Structure. It doesn’t store data in a linear way. It organizes data in a hierarchical way.

# Let’s dive in real life examples

What do I mean about hierarchical way? Imagine a family tree with all generation relationship: grandparents, parents, children, siblings, etc. We commonly organize it in a hierarchical way.

This is my `Tree Family`

. `Tossico`

, `Akikazu`

, `Hitomi`

and `Takemi`

are all my grandparents. `Toshiaki`

and `Juliana`

are my parents. `TK`

, `Yuji`

, `Bruno`

and `Kaio`

are the children of my parents (Me and my brothers, duh!)

Company hierarchy is another example.

If you are a web developer, you probably understand how the `DOM (Document Object Model)`

works. It works as a `tree`

.

The `html`

tag has all other tags. So we have a `head`

tag and a `body`

tag. Those tags contains specific elements. The `head`

tag has `meta`

and `title`

tags. And the `body`

tag has all elements that show in the UI like `h1`

, `a`

, `li`

, etc.

# A technical definition and its terminology

So a `Tree`

is a collection of entities called `node`

connected by `edges`

. Each `node`

contains a `value`

or `data`

and it can also have a `child node`

(or not).

The `first node`

of the `tree`

is called the `root`

. If this `root node`

is connected by another `node`

, the `root`

is a `parent node`

and the connected `node`

is a `child`

.

`Tree nodes`

are all connected by links called `edges`

. It’s an important part of `trees`

, because it’s how we manage relationship between `nodes`

.

`Leafs`

are the “last `nodes`

" from the `tree`

. Or `nodes`

without children. Like real trees. We have the `root`

, `branches`

, and finally the `leafs`

.

Other important concepts to understand are `height`

and `depth`

.

The` height`

of a `tree`

is the length of the longest path to a `leaf`

.

The `depth`

of a `node`

is the length of the path to its `root`

.

# Terminology Summary

**Root**: the topmost`node`

of the`tree`

**Edge**: the link between 2`nodes`

**Child**: a`node`

that has a`parent node`

**Parent**: a`node`

that has an`edge`

to a`child node`

**Leaf**: a`node`

that does not have a`child node`

in the`tree`

**Height**: The`height`

of a`tree`

is the length of the longest path to a`leaf`

.**Depth**: The`depth`

of a`node`

is the length of the path to its`root`

.

# Binary Tree

Now we will discuss a specific type of `tree`

. We call it `Binary Tree`

.

“In computer science, a

binary treeis a tree data structure in which each node has at most two children, which are referred to as the`left child`

and the`right child`

.” —Wikipedia

So let’s see an example of `Binary Tree`

.

# Binary Tree: Coding Mode On

The first thing we need to have in mind when implement a `Binary Tree`

is: it is a collection of `nodes`

, and each `node`

has three attributes: `value`

, `left_child`

, and `right_child`

.

How do we implement a simple `Binary Tree`

initializing with these three properties? Let’s see.

So here it is. Our `Binary Tree`

class. When we instantiate an object, we pass the `value`

(the `node`

‘s `data`

) as a parameter. And look at the `left_child`

and the `right_child`

. Both are set to `None`

. Why? Because when we create our `node`

, it doesn’t have any children. We just have the `node data`

.

Testing it:

That’s it. We can pass the `string`

“`a`

“ as the `value`

to our `Binary Tree node`

. If we print the `value`

, `left_child`

, and `right_child`

, we can see the values.

Let’s go to the insertion part! What do we need to do here? We will implement a method to insert a new `node`

to the `right`

and to the `left`

. What’s the rule?

- If the current
`node`

doesn’t have a`left child`

, we just create a new`node`

and set it to the current`node`

’s`left_child`

. - If it does have the
`left child`

, we create a new`node`

and put it in the current`left child`

‘s place. Allocate this`left child node`

to the new`node`

‘s`left child`

.

Should we draw it? :)

Let’s see the code.

Again, if the current `node`

doesn’t have a `left child`

, we just create a new `node`

and set it to the current `node`

’s `left_child`

. Else we create a new `node`

and put it in the current `left child`

‘s place. Allocate this `left child node`

to the new `node`

‘s `left child`

.

And we do the same thing to to insert a `right child node`

.

Done! :) But not entirely! We need to test it! Let’s build this `tree`

:

- So here the
`a`

`node`

will be the`root`

of our`binary Tree`

. `a`

`left child`

is`b`

`node`

.`a`

`right child`

is`c`

`node`

.`b`

`right child`

is`d`

`node`

(`b`

`node`

doesn’t have a`left child`

).`c`

`left child`

is`e`

`node`

.`c`

`right child`

is`f`

`node`

.- Both
`e`

and`f`

`nodes`

don’t have children.

So here it is:

Insertion is done! Now we start to think about `tree`

traversal. We have **2 options** here: **DFS** and **BFS**. WHAT?

**DFS**: Depth-First Search —*“is an algorithm for traversing or searching tree data structure. One starts at the root and explores as far as possible along each branch before backtracking.” —**Wikipedia***BFS**: Breadth-First Search —*“is an algorithm for traversing or searching tree data structure. It starts at the tree root and explores the neighbor nodes first, before moving to the next level neighbours.” —**Wikipedia*

So let’s dive into each `tree`

traversal type.

## DFS — Depth-First Search

**DFS** explores a path all the way to a leaf before **backtracking** and exploring another path. Let’s see an example of this type of traversal.

So the result of this algorithm will be: 1–2–3–4–5–6–7. Why?

- We start with the
`root`

(1). Print it. - Go to the
`left child`

(2). Print it. - Go to the
`left child`

(3). Print it. (This`node`

doesn’t have any children) - Backtrack and go the
`right child`

(4). Print it. (This`node`

doesn’t have any children) - Backtrack to the
`root`

`node`

and go the`right child`

(5). Print it. - Go to the
`left child`

(6). Print it. (This`node`

doesn’t have any children) - Backtrack and go the
`right child`

(7). Print it. (This`node`

doesn’t have any children) - Done.

This is what we call **DFS** algorithm. Going deep to the leaf and backtrack.

Now that we are familiar with this traversal algorithm, we will understand 3 types of **DFS**: `pre-order`

, `in-order`

, and `post-order`

.

**Pre-order**: This is exactly what we did above. Print the current`node`

’s value. Go to the`left child`

and print it. Backtrack. Go to the`right child`

and print it.

- Print
`node`

‘s value - Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Go to the
`right child`

and print it. If, and only if, it has a`right child`

.

**In-order**: We go way down to the`left child`

and print it first. Backtrack and print it. And go way down to the`right child`

and print it.

So the result of the `in order`

algorithm (for this `tree`

example) will be: 3–2–4–1–6–5–7. The left first, the middle second, and the right last.

Coding time!

- Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Print
`node`

‘s value - Go to the
`right child`

and print it. If, and only if, it has a`right child`

.

**Post-order**: We go way down to the`left child`

and print it first. Backtrack. Go way down to the`right child`

. Print it second. Backtrack and print it.

So the result of the `post order`

algorithm (for this `tree`

example) will be: 3–4–2–6–7–5–1. The left first, the right second, and the middle last.

Coding time!

- Go to the
`left child`

and print it. If, and only if, it has a`left child`

. - Go to the
`right child`

and print it. If, and only if, it has a`right child`

. - Print
`node`

‘s value

## BFS — Breadth-First Search

**BFS** algorithm traverses the `tree`

level by level. Depth by depth.

Here an example to better understand this algorithm.

So we traverse level by level. In this example, the result is 1–2–5–3–4–6–7.

- Level/Depth 0: only
`node`

with value 1 - Level/Depth 1:
`nodes`

with value 2 and 5 - Level/Depth 2:
`nodes`

with value 3, 4, 6, and 7

Let’s code it!

To implement a **BFS** algorithm we use the `queue`

data structure as a helper. How does it work?

- So first we add the
`root`

`node`

into the`queue`

with the`put`

method. - We will iterate while the
`queue`

is not empty. - Get the first
`node`

in the`queue`

, print its value - Add both
`left`

and`right`

`children`

into the`queue`

(if the current`node`

has`children`

). - Done. We will print each
`node`

‘s value level by level with our`queue`

helper.

# Binary Search Tree

A Binary Search Tree is sometimes called

orderedorsorted binary treesand it keeps its values in sorted order, so that lookup and other operations can use the principle of binary search — Wikipedia

One important property of a `Binary Search Tree`

is *“A **Binary Search Tree** **node**‘s value is larger than the value of any descendant of its **left child**, and smaller than the value of any descendant of its **right child**”.*

**A**: it is inverted. The`subtree`

7–5–8–6 needs to be on the right side, and the`subtree`

2–1–3 needs to be on the left.**B**: it is the only corrected option. It satisfy the`Binary Search Tree`

property.**C**: it has just one problem, the`node`

with value 4. It needs to be on the left side of the`root`

, because it is smaller than 5.

# Binary Search Tree: Coding Mode On

Now it’s coding time! What will we see here? Insertion, search for a value, deletion, and balance the `tree`

. Let’s the game begin

## Insertion: adding new nodes to our tree

Imagine we have an empty `tree`

and we want to start adding new `nodes`

with these values: 50, 76, 21, 4, 32, 100, 64, 52. In that order.

The first thing we need to know is 50 is the `root`

of our `tree`

.

So now we can start inserting `node`

by `node`

.

- 76: greater than 50, insert on the right side.
- 21: smaller than 50, insert on the left side.
- 4: smaller than 50.
`Node`

with value 50 has a`left child`

(21). 4 is smaller than 21, insert it on the left side of this`node`

. - 32: smaller than 50.
`Node`

with value 50 has a`left child`

(21). 32 is greater than 21, insert it on the right side of this`node`

. - 100: greater than 50.
`Node`

with value 50 has a`right child`

(76). 100 is greater than 76, insert it on the right side of this`node`

. - 64: greater than 50.
`Node`

with value 50 has a`right child`

(76). 64 is smaller than 76, insert it on the left side of this`node`

. - 52: greater than 50.
`Node`

with value 50 has a`right child`

(76). 52 is smaller than 76.`Node`

with value 76 has a`left child`

(64). 52 is smaller than 64, insert it on the left side of this`node`

.

Did you notice a pattern here? Let’s break it down.

- Ask “Is the new
`node`

value greater or smaller than the current`node`

?” - Is the new
`node`

value greater than the current`node`

? Go to the right`subtree`

. If the current`node`

doesn’t have a`right child`

, insert it there. Else backtrack to rule number 1. - Is the new
`node`

value smaller than the current`node`

? Go to the left`subtree`

. If the current`node`

doesn’t have a`left child`

, insert it there. Else backtrack to rule number 1. - Special case we didn’t handle: when new
`node`

value is equal to the current`node`

‘s value. Just use the rule number 3. Consider equal values to insert in the left`subtree`

part.

Now the code! yey!

It seems very simple. The powerful part of this algorithm is the recursion part: `line 9`

and `line 13`

. Both lines of code call the `insert_node`

method, for its `left`

and `right`

`children`

, respectively. Lines `11`

and `15`

are the ones that we do the insertion for each `child`

.

## Searching: let’s find the node value. Or not

The algorithm that we will build now is all about searching. For a given value (integer number), we will say if our `Binary Search Tree`

has or hasn’t that value.

One important part is how we defined the `tree`

**insertion algorithm**. First we have our `root`

`node`

. All the left `subtree`

`nodes `

will have smaller values than the `root`

`node`

. And all the right `subtree`

`nodes `

will have greater values than the `root`

`node`

.

Example time! Imagine we have this `tree`

.

Now we want to know if we find a node based on a given value 52.

Let’s break it down.

- We start with the
`root`

`node`

as our current`node`

. Is the given value smaller than the current`node`

value? If yes, then we will search it on the left`subtree`

. - Is the given value greater than the current
`node`

value? If yes, then we will search it on the right`subtree`

. - If rules number 1 and number 2 are both false, we can compare the current
`node`

value and the given value if they are equal. If the return of the comparison is`true`

then “Yeah! Our`tree`

has the given value”. Else “Nooo. It hasn’t”.

Coding time!

- Lines 8 and 9 are the rule number 1.
- Lines 10 and 11 are the rule number 2.
- Line 13 is the rule number 3.

How do we test it?

Let’s create our `Binary Search Tree`

initializing the `root`

`node`

with the value 15.

And now we will insert a lot of new `nodes`

there.

For each inserted `node`

we will test if our `find_node`

method really works.

Yeah, it works for these given values! Let’s test for a value that doesn’t exist in our `Binary Search Tree`

.

Oh yeah! Searching… done!

## Deletion: removing and organizing

Deletion is a more complex algorithm, because we need to handle different cases. For a given value, we need to remove the `node`

with this value. Imagine that this `node`

has not `children`

, or a single `child`

, or two `children`

.

**Case 1**: A`node`

with no`children`

(`leaf`

`node`

).

If the `node`

we want to delete has no children, we simply delete it. The algorithm doesn’t need to reorganize the `tree`

.

**Case 2**: A`node`

with just one child (`left`

or`right`

child).

To cover the second case, our algorithm needs to make the `node`

’s parent points to the `node`

‘s `child`

. If the `node`

is the `left child`

of its parent, we make the `node`

’s parent `left child`

points to the `node`

‘s `child`

. If the `node`

is the `right child`

of its parent, we make the `node`

’s parent `right child`

points to the `node`

‘s `child`

.

**Case 3**: A`node`

with two children.

When the `node`

has 2 children, we need to find the `node`

with the minimum value starting from the `node`

‘s `right child`

. We will put this `node`

with minimum value in the place of the `node`

we want to remove.

Coding time!

: the parameters.*First thing*`value`

and`parent`

. We want to find the`node`

that has this`value`

and the`node`

’s parent is a important part to remove the`node`

.: the returning value. Our algorithm will return a boolean value.*Second thing*`True`

if it found the`node`

and removed it. Otherwise`False`

.: we start searching for the*From line 2 to line 9*`node`

that has the`value`

we are looking for. If the`value`

is smaller than the`current node`

`value`

we go to the`left subtree`

, recursively (if and only if the`current node`

has a`left child`

). If the`value`

is greater, go to the`right subtree`

, recursively.: we start to think about the*Line 10*`remove`

algorithm.: we cover the*From line 11 to line 13*`node`

with no`children`

and it is the`left child`

from its`parent`

. We remove the`node`

by setting the`parent`

’s`left child`

to`None`

.: we cover the*Lines 14 and 15*`node`

with just no`children`

and it is the`right child`

from its`parent`

. We remove the`node`

by setting the`parent`

’s`right child`

to`None`

.: I will show the*Clear node method*`clear_node`

code below. But basically it set the`node`

`left child`

,`right child`

, and its`value`

to`None`

.: we cover the*From line 16 to line 18*`node`

with just one`child`

(`left child`

) and it is the`left child`

from its`parent`

. We set the`parent`

's`left child`

to the`node`

’s`left child`

(the only child it has).: we cover the*From line 19 to line 21*`node`

with just one`child`

(`left child`

) and it is the`right child`

from its`parent`

. We set the`parent`

's`right child`

to the`node`

’s`left child`

(the only child it has).: we cover the*From line 22 to line 24*`node`

with just one`child`

(`right child`

) and it is the`left child`

from its`parent`

. We set the`parent`

's`left child`

to the`node`

’s`right child`

(the only child it has).: we cover the*From line 25 to line 27*`node`

with just one`child`

(`right child`

) and it is the`right child`

from its`parent`

. We set the`parent`

's`right child`

to the`node`

’s`right child`

(the only child it has).: we cover the*From line 28 to line 30*`node`

with both`left`

and`right`

children. We get the`node`

with the smallest`value`

(code below!) and set it to the`current node`

‘s`value`

. Finish it by removing the smallest`node`

.: if we found the*Line 32*`node`

we are looking for, we need to return`True`

. From line 11 to line 31 we handle this case. So just return`True`

and that’s it!

The `clear_node`

method: set `None`

value to all three attributes (`value`

, `left_child`

, and `right_child`

)

The `find_minimum_value`

method: go way down to the left. If we can’t anymore, we found the smallest `node`

.

Testing time!

We will use this `tree`

to test our `remove_node`

algorithm:

Let’s remove the `node`

with `value`

8. It’s a `node`

with no child.

Now let’s remove the `node`

with `value`

17. It’s a `node`

with just one child.

Finally we will remove a `node`

with two children. The `root`

of our `tree`

.

Tests Done! :)

# That’s it!

We learned a lot here. Congrats on finishing this dense piece of content. It’s really tough to understand a new concept we are not used to. But you did it! :)

This is one more step forward in my journey on learning and mastering algorithms and data structures. You can see the documentation of my complete journey here on my **Renaissance Developer publication**.

Have fun, keep learning, and always coding.

# Resources

- Introduction to Tree Data Structure by
**mycodeschool** - Tree by
**Wikipedia** - How To Not Be Stumped By Trees by
**the talented****Vaidehi Joshi** - Intro to Trees Lecture by Professor
**Jonathan Cohen** - Intro to Trees Lecture by Professor
**David Schmidt** - Intro to Trees Lecture by Professor
**Victor Adamchik** - Trees with
**Gayle Laakmann McDowell** - Binary Tree implementation and tests by
**TK** - Coursera Course: Data Structures by
**University of California, San Diego** - Coursera Course: Data Structures and Performance by
**University of California, San Diego** - Binary Search Tree concepts and implementation by
**Paul Programming** - Binary Search Tree implementation and tests by
**TK** - Tree Traversal by
**Wikipedia** - Binary Search Tree remove node algorithm by
**GeeksforGeeks** - Binary Search Tree remove node algorithm by
**Algolist** - Learning Python From Zero to Hero