Decision Tree

Published in

Hengky Sanjaya Blog

4 min readApr 27, 2020

Week#8-Intelligent Systems

What Is a Decision Tree?
A decision tree is a diagram or chart that people use to determine a course of action or show a statistical probability.

There are a couple of algorithms there to build a decision tree :

CART (Classification and Regression Trees) → uses Gini
Index(Classification) as metric.
ID3 (Iterative Dichotomiser 3) → uses Entropy function and
Information gain as metrics.
C4.5, Regression Trees, also some bagging methods such as random
forest and some boosting methods such as gradient boosting and
Adaboost.

ID3 (Iterative Dichotomiser 3)

ID3 Algorithm:

Calculate the entropy and information gain for each
attributes
Build the tree based on the highest information gain
Repeat the process until all branches in the decision tree ended
with leafs

Entropy

Entropy is the amount of uncertainty or randomness with a particular
variable.

Entropy is an indicator of how messy your data is.

Let’s say we are about to tidy our bedroom. Usually, we use subjective measurements to decide whether the room is messy or not. Because messy or not is depends on how people see from a different perspective. Individuals might see it as organized the others might see it as disorganized.

We know that the objects must be on the shelves or grouped together; books with books, toys with toys and etc.

But instead of using the subjective measurements, we can replace it by using a more mathematical approach to our data.

That’s where the Entropy comes in. We use Entropy to decide the impurity of our dataset.

Here is the formula how we calculate the Entropy for the given datasets.

Here is the example of Entropy of a set with two classes: positive and negative.

So, the closer the value of Entropy to the zero means the more similarity in the sample. In the other hand, the closer its value to the One means that the sample is equally divided into its group.

Let’s try this:

Which row has the highest entropy?

And the answer is…

The top one is the lowest one because all the data is the same so the value of Entropy is zero.

And then the bottom one is the highest one because the data is grouped in half of the similarity. The variety is 50% 50%.

Information Gain

The information gain is based on the decrease in entropy after a dataset
is split on an attribute.

Constructing a decision tree is all about finding attribute that returns the
highest information gain (i.e., the most homogeneous branches).

The formula:

So, let’s say we have this dataset:

So to get the Entropy of the playTennis instance, we use this formula:

Entropy(S) =−9/14log(9/14)−5/14log(5/14) = 0.94

9 means the number of “yes” in the data.
5 is the number of “no”.
14 is the total data.

Next, we will try to find the first root.

First, we will try to find the Gain of Humidity

We use the value that we calculated before (0.94) and use it in the Information Gain formula.

Repeat the same way to the others attribute (Outlook, Humidity, Wind, Temperature)

Finally, after calculating all the attributes we will get this value and we will choose the highest one which is the Outlook.

Until this step, we already got the first root for our Decision Tree.

Then for the next steps, we use the same way as we used before to get the first root (Outlook), to get the next child from the Sunny, Overcast and Rain.

In the end, we will get this final result of Decision Tree:

And we are done here!! I hope this article helps you better understand the Decision Tree.

Thank you.

Other useful resources

A simple explanation of entropy in decision trees

I recently wanted to refresh my memory about Machine Learning methods. I have spent some time reading tutorials, blogs…

bricaud.github.io

Decision Tree

ID3 (Iterative Dichotomiser 3)

A simple explanation of entropy in decision trees

I recently wanted to refresh my memory about Machine Learning methods. I have spent some time reading tutorials, blogs…

Written by Hengky Sanjaya