Entropy, Gini Impurity & Information Gain

Criterion used in Constructing Decision Tree

Deeksha Singh
Geek Culture
Published in
4 min readMar 8, 2022

--

Photo by Vladislav Babienko on Unsplash

Decision tree are versatile Machine learning algorithm capable of doing both regression and classification tasks as well as have ability to handle complex and non linear datasets. Their decisions are easy to interpret. They are formed by splitting at nodes based on one feature of dataset with a set of if-then-else decision rules. But what criteria it uses to split on particular node? How we quantify the quality of split? To answer all these question we will be looking into 3 important criteria and how they are used for constructing decision tree. These are

  1. Entropy
  2. Gini Impurity
  3. Information gain

1.Entropy: Entropy represents order of randomness. In decision tree, it helps model in selection of feature for splitting, at the node by measuring the purity of the split. If,

  1. Entropy = 0 means it is pure split i.e., all instances are of only 1 class.

2. Entropy=1 means Completely impure split i.e., equal instances (50%–50%) of both class at node causing extreme disorder.

Entropy(Hi) is given by mathematical equation:

Source: Oreilly ‘s Hands-On machine learning with Scikit-learn, Keras & Tensor flow

Where, p(i,k ) is the probability of positive and negative class i at particular node.

n=Number of distinct class value at particular node.

The range of entropy H varies between 0–1.

Source : Google

Let’s understand it by doing a quick calculation at one node in below example: Using decision tree classifier for classification of Iris flower on basis of petal length and width in 3 categories, decision tree looks like,

Source: Oreilly ‘s Hands-On machine learning with Scikit-learn, Keras & Tensor flow

For left node at depth 2, entropy (Hi) will be

Credits : Oreilly ‘s Hands-On machine learning with Scikit-learn, Keras & Tensor flow

2. Gini Impurity: It also calculates the purity of the split at nodes of decision tree. The mathematical equation of Gini attribute (Gi) at ith node is given by,

On calculating GI for left node at depth 2

Source: Author

Unlike entropy, value of Gini impurity varies between 0–0.5 .A node is pure when Gini attribute is 0 i.e., all instances are of same class.

Source: Google

So Should we use Entropy or Gini Impurity ?

Most of the time they don't make much difference, they leads to similar trees but advantage of using Gini impurity is that calculation of Gini Index is computationally efficient as compared to entropy because entropy involves logarithmic calculation which takes more time.

Since entropy & Gini impurity is calculated for particular split at node but for constructing a fully grown trees algorithms creates multiple trees and then select trees using the feature and threshold that yield the largest information gain at each node. Lets understand what is Information Gain?

3. Information Gain: It represents how much entropy was removed during splitting at a node. Higher the information gain, more entropy is removed hence during training the decision tree, best splitting is chosen which have maximum information gain.

Mathematical equation for information gain is:

Where, IG is information gain, Entropy(T) is entropy at node before split (Parent Node) and Entropy (Tv) are entropies after split (Child node). T is the the total number of instances before split and Tv is the number of instances after split.

For above example, calculating the information gain for right side split:

Source: Oreilly ‘s Hands-On machine learning with Scikit-learn, Keras & Tensor flow

Entropy for parent node:

Entropy at Child Node:

Credits: Author

So, for this case, Entropy(T) =1, T= 100 , Entropy (Tv) = 0.445 for Tv=54 and Entropy (Tv) = 0.1511 for Tv=46 . So Information gain will be,

Credits: Author

So this is all about the criterion used for constructing the fully grown trees. Thanks for reading it. Happy data driven learning !!

References:

  1. https://scikit-learn.org/stable/modules/tree.html
  2. O’Reilly Media, Inc. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow.

--

--