Decision tree and it’s split logic — Understanding Entropy.

Anju Rajbangshi
Analytics Vidhya
Published in
6 min readApr 22, 2020

--

Decision tree is one of the simplest machine learning algorithms and a very popular learning model for predictions. Decision tree learning model is a good choice for both regression and classification problems. However, the model works on split logic and is hierarchical in nature.

Here, we will focus on understanding the splitting criteria for a decision tree.

Any one of the following four algorithms can be used by a decision tree to decide the split.

· ID3 (Iterative Dichotomizer) — This algorithm uses calculation of entropy and information gain as the default method for deciding the split.

· Gini index — When decision tree is built using rpart library, gini index is used as the default method to calculate the split.

· Chi Square.

· Reduction in variance.

In this article, we will focus on calculating the information gain via the entropy method.

The feature having the highest information gain will be the one on which the decision tree will be split.

Entropy Formula:

Where p(x) is sample of feature.

Information Gain Formula:

Now let us take a simple example: Suppose we have two independent features ‘Health Status’ and ‘Gender’ and the response variable is ‘Coronaaffected’.

Coronaaffected — binary class variable (1 — Affected, 0 — not Affected).

Now we will try to figure out on which of the above two independent features, the decision tree should be first split.

Step 1: Splitting the output class — CoronaAffected which has three classes of 1-Affected and one class of 0 — Not Affected.

Hence,

P (1) = Number of values with class 1/Total number of classes.

= 3/4

P (0) = Number of values with class 0/Total number of classes.

= 1/4

Step 2: Calculating the Entropy:

Entropy (parent) = — £ p(x) log p(x)

= — [P (1) log P (1) + P (0) log P (0)]

= — [3/4 log (3/4) + 1/4 log (1/4)]

= — [0.75 log (0.75) + 0.25 log (0.25)]

= — [(0.75 * -0.12) + (0.25 * — 0.60)]

= — [-0.09 + (-0.15)]

= — [0.09–0.15]

= — [-0.24]

= 0.24

Now we will calculate the information gain of each feature and then check which feature is having the highest information gain.

Information Gain = Entropy (parent) — [weighted average * Entropy (children)]

Feature 1: Gender.

Step 3: Calculate entropy for each gender (M/F) based on output class.

As we can see in the above table, Out of 3 females, two got affected having classes of 1 and one was not affected having class 0.

One Male got affected having class of 1.

Step 4: In above diagram, we have one parent node and two children nodes — left node (Female) and right node (male).

We need to calculate entropy of left node and right node.

è Right node entropy is 0 as it is having the same class.

è If a sample is completely homogeneous i.e. of the same class, the entropy will be 0 and if the sample is equally divided-for e.g. two classes of 0 and two classes of 1, then the entropy will be 1. And if the sample is of mixed class, then we need to calculate the entropy using below method.

è Left node has two different classes 0 and 1, so we need to calculate entropy of left node. We will always calculate entropy of children when the node has mixed class.

è Left node: P (1) = 2/3 and P (0) = 1/3

Entropy (left node) = — [P (1) log P (1) + P (0) log P (0)]

= — [2/3 log (2/3) + 1/3 log (1/3)]

= — [0.67 log (0.67) + 0.33 log (0.33)]

= — [((0.67)*(-0.17)) + ((0.33)*(-0.48))]

= — [-0.11–015]

= 0.26

In python, we can perform the same and calculate the entropy for left node using stats module from scipy library.

Step 5: Calculate weighted average of children (Gender).

(Weighted average of children (Gender) * Entropy) = [total number of classes under female gender/total classes * entropy of the left node (female)] + [total number of classes under male gender/total classes * entropy of the right node (male)].

= 101/1011 * entropy (left node/female) + 1/1011 * entropy (right node/male)

= 3/4 * 0.26 +1/4 * 0

= 0.19 + 0

= 0.19

Step 6: Calculate information gain of gender.

IG (gender) = Entropy (parent) — (Weighted average of children * entropy Gender)

= 0.24–0.19

= 0.05

Repeat the above process for the feature ‘Health Status’.

Feature 2: Health Status.

Step 1: As we can see in the table, one person with ‘Good’ health status is clean (class 0).

And out of three persons with ‘Bad’ health status, all three got affected (class 1).

Step 2: In above diagram, we have one parent node and two children nodes — left node (Good) and right node (Bad).

We need to calculate entropy of left node and right node.

è Entropy of both left and right node is 0 as they are having the same class.

Step 5: Calculate weighted average of children (Health Status).

Weighted average of children * entropy (Health Status) = [total number of classes under Good Health Status/total classes * entropy of the left node (Good)] + [total number of classes under Bad Health Status gender/total classes * entropy of the right node (Bad)].

= 0/1011 * entropy (left node/Good) + 111/1011 * entropy (right node/Bad)

= 1/4 * 0 + 3/4 * 0

= 0 + 0

= 0

Step 6: Calculate information gain of Health Status.

IG (gender) = Entropy (parent) — Weighted average of children * entropy (Health Status).

= 0.24–0

= 0.24

Step 7: Since Information gain of Health Status (0.24) > Information gain of Gender (0.05), the decision tree will be split first on Health Status.

In general, the same process is recursively executed for all the features until all data is classified.

Conclusion: Generally, we don’t have to do all these calculations manually. When we execute the decision tree model, all these calculations will be done in the backend by the algorithm.

--

--