Machine Learning: Decision Trees
This blog covers another interesting machine learning algorithm called Decision Trees and it’s mathematical implementation.
At every point in our life, we make some decisions to proceed further. Similarly, this machine learning algorithm also makes the same decisions on the dataset provided and figures out the best splitting or decision at each step to improve the accuracy and make better decisions. This, in turn, helps in giving valuable results.
A decision tree is a machine learning algorithm which represents a hierarchical division of dataset to form a tree based on certain parameters.
Now, let’s discuss some basic terminologies related to a decision tree.
- Root Node: The starting node from where the first splitting occurs is called as Root Node. In other words, the topmost node is called as Root Node. In the above image, the “work to do” is the root node.
- Internal Nodes: The nodes which denote a test on an attribute is called an Internal Nodes. It doesn’t classify or holds a class label. It helps in further splitting to achieve leaf nodes. In the above image, “Outlook?” is an internal node.
- Leaf Nodes: The nodes which hold a class label is termed as leaf nodes. After this node, no further splitting can be done.
- Branch: The branch in a decision tree represents an outcome of the test done on an internal node.
Mathematical Implementation
- Entropy: Entropy refers to the degree of randomness in the data. So, for a particular node whether root or an internal node we’ll calculate the entropy. The formula to calculate entropy is :
Consider a binary classification “Yes/No”. For a node, there are 9 Yes and 5 No. So, the entropy is:
E = -(9/14)log (9/14) — (5/14)log(5/14)
E = 0.94
It can be seen this is a high degree of randomness. Entropy ranges from 0 to 1. So, if entropy is 0 then it’s pure division and if entropy is 1 then it’s impure division. Entropy should be as low as possible.
2. Information Gain: In simple words, information gain compares the entropy of the sample before and after the split. The formula to compute information gain of a particular node is:
So, the algorithm computes the information gain for all the possible splits and finds out which is the best one. The division with the highest information gain is the best.
3. Gini Impurity: Gini Impurity is same as Entropy as is used to compute the purity of a split. Most of the times Gini impurity is preferred over entropy because the Gini impurity is easy to compute and the range of Gini impurity is from 0 to 0.5. The formula to compute Gini impurity is:
The comparison between Gini impurity and Entropy can be seen from the below graph.
So, a decision tree needs to compute the overall information gain with the help of either the entropy or the Gini impurity. Since Gini impurity is computationally efficient, it is preferred.