Data Science : Decision Tree

Anjani Kumar

Published in

Analytics Vidhya

7 min readApr 28, 2020

Decision Tree Explained Easily

What is Decision Tree ?

Decision Tree is supervised machine learning algorithm used for classification and regression problems.

Classification deals with predicting class of discrete values like 0/1 or predicting if some one is having disease or not where as Regression deals with predicting continues values like salary of a person, house price etc.

Intuition

As the name goes, it uses a tree-like model for taking decisions. A decision tree is drawn upside down with its root at the top .It also consist of internal nodes, leaf node which we will discuss further.

It works like a human brain to decide. Like in below picture if we need to check if a person is fit or not it will check certain criteria and based on that it will decide and return the result. In Machine learning we can do this with Decision Tree Classifier.

Age=Root Node,Eat pizza/Exercise=Internal Nodes,Fit/Unfit=Leaf Node

Splitting Criteria in Decision Tree :

Its a big issue to choose the right feature which best split the tree and we can reach the leaf node in less iteration which will be used for decision making.There are various techniques used to decide about feature which can be used as root node and internal nodes.

Classification Techniques :

1. CHAID(Chi-square automatic interaction detection)

2. CART (Classification and Regression Tree) uses Gini index(Gini Impurity) as metric.

3. ID3 (Iterative Dichotomiser 3) uses Entropy and Information gain as metrics.

We will discuss CART Technique here which uses Gini Index or Gini impurity.

GINI Index or GINI Impurity :

Gini index or Gini impurity is used as a measure of impurity of a node in the decision tree .A node is said to be 100% pure if all the records belongs to same class(of dependent variable).A Node is said to be 100% impure if half of the records belong to one class and half of the records belong to other class in binary classification.Pic. Below

Feature with least Gini index will be chosen for split.The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one. It favors larger partitions.

Formula to calculate Gini index is as below.

Gini of split is computed as weighted average of the each node at split node level.

Lets see the example with data set to understand easily.

This is our data set .Let’s understand with a simple example of how the Gini Index works.Column ‘Return’ is our target variable

Let us start by calculating the Gini Index for ‘Past Trend’.

Note: ‘P’ here means Probability

P(Past Trend=Positive): 6/10

P(Past Trend=Negative): 4/10

When (Past Trend = Positive & Return = Up), then probability = 4/6

When (Past Trend = Positive & Return = Down), then probability = 2/6

Gini index = 1 — ((4/6)² + (2/6)²) = 0.45

When (Past Trend = Negative & Return = Up), then probability = 0

When (Past Trend = Negative & Return = Down), then probability = 4/4

Gini index = 1 — ((0)² + (4/4)²) = 0

Weighted average of the Gini Indices can be calculated as follows:

Gini Index for Past Trend = (6/10)0.45 + (4/10)0 = 0.27

Calculation of Gini Index for Open Interest

P(Open Interest=High): 4/10

P(Open Interest=Low): 6/10

When (Open Interest = High & Return = Up), then probability = 2/4

When (Open Interest = High & Return = Down), then probability = 2/4

Gini index = 1 — ((2/4)² + (2/4)²) = 0.5

When (Open Interest = Low & Return = Up), then probability = 2/6

When (Open Interest = Low & Return = Down), then probability = 4/6

Gini index = 1 — ((2/6)² + (4/6)²) = 0.45

Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Open Interest = (4/10)0.5 + (6/10)0.45 = 0.47

Calculation of Gini Index for Trading Volume

P(Trading Volume=High): 7/10

P(Trading Volume=Low): 3/10

When (Trading Volume = High & Return = Up), then probability = 4/7

When (Trading Volume = High & Return = Down), then probability = 3/7

Gini index = 1 — ((4/7)² + (3/7)²) = 0.49

When (Trading Volume = Low & Return = Up), then probability = 0

When (Trading Volume = Low & Return = Down), then probability = 3/3

Gini index = 1 — ((0)² + (1)²) = 0

Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Trading Volume = (7/10)0.49 + (3/10)0 = 0.34

From the above table, we observe that ‘Past Trend’ has the lowest Gini Index and hence it will be chosen as the root node for our decision tree.

Similarly we will repeat the same procedure to determine the sub-nodes or branches of the decision tree and we reach to the leaf node or the maximum depth defined.

Entropy:

It is another technique to measure the impurity, disorder or uncertainty of a node in decision tree. Entropy value lies between 0 to 1.

An node is said to be completely impure if it has equal number of classes available(equal number of ‘YES’ and equal number of ‘NO’ in two class classification problem)

A node is said to be completely pure if it contains either of Yes or No, Class labels(all ‘YES’ and Zero ‘No’ OR all ‘NO’ and Zero ‘YES’)

Information gain (IG) measures how much “information” a variable or feature gives us about the class.

The information gain is based on the decrease in entropy after a data set is split on an variable/feature.

Why it is important ?

Information gain is the main key that is used by Decision Tree Algorithms to construct a Decision Tree.
Information gain internally uses entropy to calculate impurity of node.
An feature/variable with highest Information gain will be considered to construct the tree first.

Equation of Entropy:

Equation of Information Gain :

Information gain of each feature/variable in the data set is calculated and feature with highest information gain will be used to construct the decision tree.

Code for Reference :

Decision Tree Classifier Visualization:

You need to install sklearn ,0.21 or higher version for this kind of visualization.Steps mention in below code.

Decision Tree Advantage :

It is computationally very fast.
A decision tree does not require normalization/scaling of data set because it uses probability function to train Decision Tree model and does not uses any distance metrics.
Missing values in the data also does NOT affect the process of building decision tree to any considerable extent.
Easy to explain.
Gini index is preferred over Entropy as its computationally fast and gini does not have to calculate the ‘log’ of any value like entropy.

Disadvantage:

Decision tree learners can create over-complex trees that does not generalize data well.This is called over fitting.(Over fitting happens when model’s training accuracy is high and testing accuracy is low)
Decision tree can be unstable because small variation in the data might results in a completely different tree being generated.This is called variance,which needs to be lowered by method like bagging or boosting.
Decision tree learners create biased trees,If some class (from target feature) dominate.It is therefore recommended to balance data set prior to fitting with the decision tree.

Conclusion :This is all about decision tree with Gini index.There is another technique called Entropy using Information Gain,which I will plan to add later.Hope you enjoyed reading.If you like my article and want to see more please post 50 claps and follow my blog .

Want to connect :

Linked In : https://www.linkedin.com/in/anjani-kumar-9b969a39/

If you like my posts here on Medium and would wish for me to continue doing this work, consider supporting me on patreon