Decision Tree Classifier, Explained

Lujing Chen

Published in

Bite-sized Machine Learning

6 min readDec 1, 2018

This blog will cover four parts to explain the decision tree classifier:

What is decision tree
How does decision tree work
How does decision tree choose how to split
How to implement decision tree in Python

1. What is Decision Tree?

Imagine you want to predict whether it is going to snow tomorrow? so that you can decide whether to wake up early or not to shove the snow. ❄🚗❄
To predict this event, you utilize the historical weather data, shown here.

As you can see in the plot, all the red data point — the snow instances — happen under the temperature lower than 30 and the humidity level above 70. We can transfer this scatter plot into a decision tree diagram like something on the left.

For this snow dataset, you can predict whether it is going to snow or not perfectly by asking two questions: Is Temperature below 30? and Is Humidity above 70?

2. How does decision tree algorithm work?

After getting a sense of what’s the decision tree, let’s dive in how decision tree algorithm work. The decision tree can be used in both regression (for continuous variable) and classification (categorical variable) task. For simplicity, this blog will only touch on the latter, decision tree classifier.

Decision tree classification algorithm contains three steps: grow the tree, prune the tree, assign the class.

If you are like me, you may ask what is prune🙄….
prune: to cut or lop off (twigs, branches, or roots). to cut or lop superfluous or undesired twigs, branches, or roots from the tree.

Okay, intuitively:

Grow the tree: splitting the space by setting rules
Prune the tree: removing the unnecessary splits
Assign the class: using the class with majority votes as the prediction

This blog will focus on the rationale behind growing the tree, if you are interested in the pruning the tree please refer to the link in the further reading.

3. How Decision Tree choose where to split?

Decision tree splits based on three key concepts:

Pure and Impure
Impurity measurement
Information Gain

Let’s explained these three concepts one by one like you are five.

1. Pure and Impure

A tree-node is pure if the node contains one single class, e.g. snow days.
A tree-node is impure if the node contains more than one class, snow,no snow
So, which trees’ nodes are purer, A or B?

A, right! because either left or right node under Tree A contains only one class — PURE
B is impure because either left or right node under Tree B contains two classes — IMPURE

Hold on a second, if you want to predict snow or not, which question will you ask A or B?

Your answer is probably ‘A’ even you don’t know what is decision tree, because Question B basically does not give you any useful information, but by asking A, you can clearly get your answer.

Same thing, decision tree’s underlying logic is trying to find those RIGHT questions to ask so that leads to the purest nodes (or the least impure nodes).

So could we quantify the impurity of the node, instead of just roughly eyeballing? Yes, we can.

2. Impurity measurement

Two most common impurity functions are Entropy and Gini index.

Some properties of impurity functions:

the range of the Entropy is from 0 to 1, while the range of the Gini Index is from 0 to 0.5.
The lower the alue of the impurity function, the purer the node is. 😀
The higher the value of the impurity function either Entropy or Gini, the more impure the node is. 🙃

Let’s first applied the entropy formula calculate for our above two examples.

Let’s then calculate the Gini Index for our above two example again:

Take away from these two calculations:

The Entropy/Gini Index of node 1 is 0, which is the minimum value those two measurements can get, as well as the purest split. The decision tree will want to split the Question A.
The Entropy/Gini of node 2 is 1, 0.5 correspondingly, which is the maximum those measurements can get, as well as the impurest split. Decision tree will not not not want to split the Question B.

3.information gain

Information gain, the new information you gain through splitting, is the impurity of the parent node minus the combined impurities of the children nodes. Maximizing the information gain for each split is the key for growing a decision tree.

Either Entropy or Gini index can be used to calculate the impurity for information gain.

Let’s use the question B tree as an example to calculate when split the tree by asking Question C, how much information are we gaining? Entropy is applied as the impurity function here.

From here, by splitting the tree using Question C, the information gain is 0.3974.

Note, the calculation of information gain is not limited to the categorical/binary attribute. When decision tree is trying to find the best threshold for a continuous variable to split, information gain is calculated in the same fashion.

4. Decision Tree Classifier Implementation using Sklearn

Step1: Load the data

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

Step2: Split the data

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,
random_state = 42)

Step3: train the model

from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)
pred = clf.predict(X_test)

Step4: evaluate the model performance using test set

from sklearn.metrics import accuracy_score
acc = accuracy_score(y_test,pred)
print('accuracy rate', acc)

Step5: print the decision tree

import graphviz 
dot_data = tree.export_graphviz(clf, out_file=None, 
                         feature_names=iris.feature_names,  
                         class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)  
graph = graphviz.Source(dot_data)  
graph

Overall, we can see this decision tree is splitting based using the attribute threshold leads to the lower Gini Index, e.g. gini = 0.

I will illustrate again the calculation for the nodes in the box below in case you are still confused.

Further Links: https://en.wikipedia.org/wiki/Decision_tree_pruning

Please feel free to leave any comment, question or suggestion if you have any thought related to this post.

Please clap if you feel the content is useful! :) Thanks you!

Decision Tree Classifier, Explained

1. What is Decision Tree?

2. How does decision tree algorithm work?

3. How Decision Tree choose where to split?

4. Decision Tree Classifier Implementation using Sklearn

Written by Lujing Chen