Decision Tree - an intuitive understanding

Published in

Analytics Vidhya

5 min readSep 7, 2020

Decision tree algorithm is one of the powerful tools of machine learning. The most basic example of decision tree would be buying a car. When buying a car there are lots of questions related to mileage, number of airbags, price, anti-lock brakes, etc. Once you get answer to all these questions you come up with a decision of which car would be convenient to buy.

An actual definition of decision tree would be, it is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or action.

Some of the advantages of using decision tree algorithm is that, it is simple to interpret and visualize, it can handle both categorical and numeric data and in most of the cases non-linear parameters don’t affect its performance.

Decision tree algorithm can solve classification or regression problem but mostly used for classification problem statement. The most important step in decision tree is selecting correct attribute in features. To do this we apply:

Entropy

Entropy is the measure of randomness or unpredictability in the data set. It helps us to measure the purity of the split.

This is a tree with 7 nodes. Consider a classification data set with an output column saying true or false. So suppose there are 9 True and 5 False in node N1. After node N1 the tree is split into N2 node with 3 True and 2 False & N3 node with 6 True and 3 False. And finally the nodes are split into N4 with 3 True, N5 with 2 False, N6 with 6 True and N7 with 3 False. Now the tree needs to decide whether it will go with N2 split or N3 split. For deciding this entropy is calculated. So let’s calculate entropy:

After calculating for node N2 and node N3 you’ll notice that the entropy value is between 0 to 1. The entropy value will always range from 0 to 1. Following are the best and worst cases of entropy:

Imagine a scenario where you’ll get 3 True and 3 False. This scenario is the example of worst case, because this scenario says the probability is 50% True and 50% False. In such scenario the entropy will always be 1. So for the best case the entropy will be 0, maybe with 4 True and 0 False because in this case the probability of output being True is 100%. So in short when the Entropy value is close to 0, the tree goes with that specific node as compared to other node.

After Entropy let’s plunge into a new concept called as Information gain.

Information Gain

Information gain is the measure of decrease in entropy after the data set is split.

Information gain helps you to get best split for your decision tree. This basically means if you have 2 splits for a decision tree used in a particular problem statement, the information gain value is calculated and then it is compared. The value which is higher will be used or considered as best.

In short, it is good if Entropy value is less and information gain value is more.

There is one more concept involved in decision tree, which is Gini Impurity.

Gini Impurity

Gini Impurity does the same work as Entropy, but many Machine Learning Engineers use gini impurity in place of Entropy. Let’s see why:

Consider:

This is a tree with 3 Nodes. Consider a classification data set with an output column saying true or false. So suppose there are 6 True and 3 False in node N1. After Node N1 the tree is split into Node N2 with 3 True and 3 False and Node N3 with 3 True and 0 False. You would have predicted till now that Node N3 is the leaf node since the Entropy of node N3 is 0.

If you carefully observe you would know that the Entropy of node N2 is 1. This is the worst case as the chances of True and False in this case is 50% as discussed above. But if you calculate Gini Impurity you’ll notice that the value comes out to be 0.25.

Let’s plot a graph to understand Entropy and Gini Impurity better:

This graph shows how Entropy and Gini Impurity differ from each other. When P(+ve) is increasing Entropy is also increasing and at 0.5 that is at 50% the Entropy becomes 1.

When P(+ve) is increasing Gini Impurity is also increasing and at 0.5 that is 50% it also becomes 0.5. This indicates that the value of Gini Impurity is much more lesser as compared to Entropy.

The advantage of using Gini Impurity over Entropy is that, it is more efficient or rather it is computationally more efficient than Entropy. It takes shorter period of time for execution. Now, this is because Entropy contains logarithmic calculations and logarithmic calculations takes much more time but in Gini Impurity there is no logarithmic calculations involved so it takes less time. This is why people prefer Gini Impurity than Entropy.

The disadvantage of using decision tree is the overfitting problem. Overfitting is the condition where the training data accuracy is high(Low bias) and testing data accuracy is low(High variance). This means overfitting condition occurs when the algorithm captures noise in the data.

Thank you so much for reading this blog. If you like it then please give it a clap and share it with your friends, colleagues and family.

Decision Tree - an intuitive understanding

Entropy

Information Gain

Gini Impurity

Written by Chaitra Naik