Decision Tree and Its Mathematical Concept

RADIO SAYS Arpit pathak
ML_with_Arpit_Pathak
6 min readJun 17, 2020

Hello readers , so here is the next very interesting type of concept and algorithm in machine earning called the Decision Tree . This blog covers the introduction to Decision Tree and its mathematical concept of how it is implemented .

What is a Decision Tree

A Decision Tree is a representation of all the possible solutions to arrive at a decision based on some conditions in a graphical way or in the form of a hierarchical tree .

Let us take an example by assuming that we have to decide whether to go out and play with friends , go out and do cycling , be at home and play chess with you sibling or be at home and play an online game . Let us see how the decision can be taken .

Decision Tree Example

So , in the above example image , we took a decision by converting the whole problem into a hierarchical tree of step wise decisions or segment to come out with a final solution . This is the tree based decision making called as decision tree .

Some Basic Terminologies

  1. Entropy: It is defined as the randomness in the data or the unpredictability of the data. The goal of the decision tree algorithm is to decrease the entropy of the data to zero or close to zero by classifying it on the basis of different features and conditions.
  2. Root Node: The root node or the of the decision tree is the topmost node to which the whole dataset is given. Then the data is classified from root node to the sub nodes or child nodes on the bases of some feature or condition . For example : in the above image of decision tree , the weather type is the root node .
  3. Parent Node and Child Nodes : Every node that classifies the data on the basis of some condition or the feature is called the parent node and the sub-nodes of this parent node are known as the child nodes . For example : in the above image of decision tree , the “Weather type” is parent node and “Go outside” and “Stay at home” are its child nodes . Same applies to the “Go outside” as parent node and its sub-nodes “Play with friends” and “Do cycling” as its child nodes .
  4. Level Of Classification : The level of classification tells the number of classifications that are performed on the data . When the data is given to the root node , it is the Level 0 of classification .
  5. Information Gain : If we take a theoretical approach to this term , then information gain is the amount of entropy that has decreased at every classification level or we can say that the decrease in the randomness and increase in the predictability of the data . The information gain is calculated at each level of classification by comparing the entropy of the data before and after the classification .
  6. Leaf or Terminal Node : The leaf node is the last node in the decision tree after which no further classification can be performed or need not to be performed . In other words , the leaf node is the node in which the least entropy of the data is achieved and the data in the leaf node is homogeneous in nature .

Decision Tree Concept

A Decision Tree concept is a regression based algorithm which uses the regression technique of the dependency of a categorical variable on the other categorical independent variables and then performs the classification by adding label to each decision i.e the unique category name .

A decision tree takes the whole data and keeps partitioning this data into subsets depending on certain conditions until each subset becomes homogeneous i.e all the values in the subset are similar to each other .

Let us consider the following data of type of vehicle —

In this data , the decision is to be made to predict the vehicle name according to the properties like number of wheels , height of vehicle and presence of engine . The decision tree will take each of the property decision and classify the vehicles based on them until each classification is homogeneous as follows —

At the Level 0 , the prediction of vehicle type is not possible because the the type of vehicles are very random and hence the entropy or randomness of the data at Level 0 is very high .

At Level 1 , the vehicles are classified on the basis of the number of wheels they have and hence we can classify the vehicles into wheeler , 3 wheeler and 4 wheeler . Since , the randomness of the data has decreased , the entropy of this level has decreased w.r.t Level 0 . But there are still random values in each of the 3 categories of wheels . So , we move to classify again on another basis .

At Level 2 , the vehicles are again classified on the basis of the either the vehicle has a engine or not . Here , the homogeneity or the similarity of data in each category is achieved in the 2 wheeler and 3 wheeler and the entropy in these two categories become zero . So , no further distribution is required . However , in the 4 wheeler distribution of engines , the entropy has not come to zero and therefore we need to again segment this data .

At Level 3 , finally the whole data has become homogeneous and the overall entropy of the data is zero . Now the name of the vehicle can be easily and accurately predicted .

Working of Decision Tree and Mathematical Concept

Now you might be thinking that how does the decision tree know that which partitioning feature to apply at which level ??

In order to understand this ,at first the decision tree calculates the uncertainity or randomness in the dataset at a certain level by using a concept metric called Gini Impurity . Then it takes each feature , classify the data and compares the change in Gini impurity in the next classified level with the previous level by a method called Information gain . Whichever feature or condition give the highest information gain is applied to the decision tree at that particular level .

  • Formula for calculating the gini impurity is as follows —

Here , “C” is the number of classes of the data at a particular node and “p” stands for probability . So , the gini impurity is the “summation of square of the probability of each class or category in the dataset” decreased by 1 .

  • Formula for calculating the Information gain is as follows —

Here , Entropy(T) represents the Entropy of the data at the current level “T” and Entropy(T,X) represents the Entropy of the data after applying the condition “X” on the level “T” .

The Decision Tree concept works on the Classification and Regression Trees (CART) algorithm that uses this Gini Impurity metric as the mathematical tool for training the dataset by making a Decision tree .

That’s all about this blog . Hope it was an informative one . Thank you for reading…!!!

--

--