Decision Tree

Hengky Sanjaya
Hengky Sanjaya Blog
4 min readApr 27, 2020

Week#8-Intelligent Systems

What Is a Decision Tree?
A decision tree is a diagram or chart that people use to determine a course of action or show a statistical probability.

There are a couple of algorithms there to build a decision tree :

  • CART (Classification and Regression Trees) → uses Gini
    Index(Classification) as metric.
  • ID3 (Iterative Dichotomiser 3) → uses Entropy function and
    Information gain as metrics.
  • C4.5, Regression Trees, also some bagging methods such as random
    forest and some boosting methods such as gradient boosting and
    Adaboost.

ID3 (Iterative Dichotomiser 3)

ID3 Algorithm:

  • Calculate the entropy and information gain for each
    attributes
  • Build the tree based on the highest information gain
  • Repeat the process until all branches in the decision tree ended
    with leafs

Entropy

Entropy is the amount of uncertainty or randomness with a particular
variable.

Entropy is an indicator of how messy your data is.

Let’s say we are about to tidy our bedroom. Usually, we use subjective measurements to decide whether the room is messy or not. Because messy or not is depends on how people see from a different perspective. Individuals might see it as organized the others might see it as disorganized.

We know that the objects must be on the shelves or grouped together; books with books, toys with toys and etc.

But instead of using the subjective measurements, we can replace it by using a more mathematical approach to our data.

That’s where the Entropy comes in. We use Entropy to decide the impurity of our dataset.

Here is the formula how we calculate the Entropy for the given datasets.

Here is the example of Entropy of a set with two classes: positive and negative.

So, the closer the value of Entropy to the zero means the more similarity in the sample. In the other hand, the closer its value to the One means that the sample is equally divided into its group.

Let’s try this:

Which row has the highest entropy?

And the answer is…

The top one is the lowest one because all the data is the same so the value of Entropy is zero.

And then the bottom one is the highest one because the data is grouped in half of the similarity. The variety is 50% 50%.

Information Gain

The information gain is based on the decrease in entropy after a dataset
is split on an attribute.

Constructing a decision tree is all about finding attribute that returns the
highest information gain (i.e., the most homogeneous branches).

The formula:

Information Gain Formula

So, let’s say we have this dataset:

So to get the Entropy of the playTennis instance, we use this formula:

Entropy(S) =−9/14log(9/14)−5/14log(5/14) = 0.94

  • 9 means the number of “yes” in the data.
  • 5 is the number of “no”.
  • 14 is the total data.

Next, we will try to find the first root.

  • First, we will try to find the Gain of Humidity

We use the value that we calculated before (0.94) and use it in the Information Gain formula.

  • Repeat the same way to the others attribute (Outlook, Humidity, Wind, Temperature)

Finally, after calculating all the attributes we will get this value and we will choose the highest one which is the Outlook.

Until this step, we already got the first root for our Decision Tree.

Then for the next steps, we use the same way as we used before to get the first root (Outlook), to get the next child from the Sunny, Overcast and Rain.

In the end, we will get this final result of Decision Tree:

And we are done here!! I hope this article helps you better understand the Decision Tree.

Thank you.

--

--