Decision Tree Explained…

Harsh Tiwari
Analytics Vidhya
Published in
3 min readOct 16, 2020

Decision Trees (DTs) are non-parametric supervised learning method used for classification and regression. The goal is to create the model based on some decision rules from data features. It is also capable of performing multi-class classification on the dataset.

Before telling you all about how decision tree algorithm works ,I would like to introduce you about few terms which play an important role in the algorithm.

Basic Structure of Decision Tree
  1. Entropy : It is a measure of impurity. It tells how impure a particular node is after or before splitting. It’s value lies between 0 and 1.

0 means the split is completely pure while 1 represents the completely impure split.

Entropy(E)

S = sample of given attribution

p+ = probability of positive class

p_ = probability of negative class

Less the entropy value ,more the pure split is.

2. Gini impurity : It is also a measure of impurity in the nodes but finding it is much easier as compared to Entropy. It is also computationally much efficient than entropy. It’s value lies between 0 and 0.5

Gini Impurity(G.I)

3. Information Gain: It tells how much “information” a particular feature provides us for splitting. More the value ,more information is received .

Information Gain

S(v) = Sample after the split

S = Sample before the split

How Decision Tree algorithm works

  1. Firstly ,it will create multiple decision trees of depth 1 and find out the most appropriate feature to place on the root node .This can be found by using information gain for every feature. The feature having maximum information gain will get choose for the root node.
  2. After fixing the root node ,we will again start splitting the decision tree in the same fashion as we did for root node .We will find the highest information gain for the remaining feature and will start placing in the internal nodes.
  3. This splitting process continues until we get the completely pure split (either all samples are “yes” or “no”) or until we have achieved the selected maximum depth for our decision tree.

Decision tree(s) are prone to over-fitting the data .For this we can prune our decision tree using the minimum number of samples requires at the leaf node or using maximum depth of the tree.

Decision tree are also prone to biasedness if one class dominates. It is therefore required to balance the dataset before fitting the decision tree.

Decision Tree

ID3 v/s CART

These are two types of algorithm which decision tree uses internally.

  1. ID3 : It uses Entropy for finding the impurity of the nodes .
  2. CART : It uses Gini impurity for finding the impurity of the nodes.

Decision tree(s) are easy to interpret .It requires less data preparation. It can handle both numerical and categorical features though on numerical features it works slow.

for more info on DT(s) ,please check :https://www.youtube.com/watch?v=7VeUPuFGJHk

If you like this article ,dont forget to hit clap icon.

Happy Learning.

--

--

Harsh Tiwari
Analytics Vidhya

Data Science learner|| Eager to learn new things || Linkedin :-www.linkedin.com/in/harsh-tiwari-ds