A Deep Dive into Decision Trees

Published in

Analytics Vidhya

5 min readOct 21, 2019

A decision tree is a technique of categorizing the data into various categories. We can derive out relevant information from our data and ask questions accordingly to categorize our data into several nodes i.e. several set of questions.

While performing any predictive analysis on our data using decision tree, first of all we analyze our data then on the basis of various conditions we are classifying it based on certain categories.

The name of the decision tree is named so because it starts with one root node and branches off into a no. of decisions(i.e. interior and leaf nodes mentioned in above picture) just like a tree. The root node keeps on growing with no. of decisions and conditions, e.g. when we call any customer care support they provide us with a no. of options on the phone dial pad for pressing so that we can reach the point where we can inquire about the information of our liking. Thus, customer care support is building a decision tree to make us reach out to the right scenario.

Important terminologies used in decision tree making →

Root Node → It is the base node from where the entire tree starts from. It is the first node that represent the entire population of the tree.
Leaf Node → The nodes which are present at the end or last position of the tree.
Splitting → The process of dividing the root node into several different sub parts i.e. interior and leaf nodes.
Branch Tree → The nodes that are generated when we split our tree .i.e when we separate out any particular section of our decision tree.
Pruning → It is the opposite of splitting. It is the process of removing unwanted branches from the tree.
Parent & Child node → Root node is always the parent node and other nodes connected to it are all child node. Thereby, we can infer that all the top nodes are parent nodes and all the bottom nodes derived from the parent node are child nodes.

Before we move on to know how to construct a decision tree there are some terminologies that comes frequently while constructing it. These are : →

Gini Index → It is the measure of impurity based on building a decision tree by classification and regression tree(CART) algorithm.
Information Gain → It is the decrease or reduction in entropy after a dataset is split on the basis of an attribute so that it helps to decide which attribute should be selected as the decision node. Constructing a Decision tree is all about finding the attribute that returns highest information gain .
Reduction in variance → This is an algorithm used for continuous target variables(for regression problems). The split with lower variance (how much our data is varying) is selected as the criteria to split the population.
Chi Square → This is an algorithm to find out the statistical significance difference between the sub nodes and parent nodes .
Entropy → It helps to decide the best attribute for start for start making decisions. Also helps in telling the attribute with highest information gain . It is the presence of impurity ( degree of randomness ).

|Entropy = -P( yes ) * log2 P( yes ) -P( no ) * log2 P( no )|

where ,

→ P( yes or, no) is the probability of yes responses out total responses.

→ log2 P( yes or, no ) is the log probability of yes responses out total responses.

CART Algorithm : →

Now, when we start constructing the decision tree we will always calculate these things n a sequential manner —

Entropy
Information from particular attribute of the dataset,
Information gain from the Outlook feature.

Now we will construct a decision tree with taking this example as our dataset.

Thereby, first we calculate the entropy of the dependent column i.e. ‘play’ where,

Now, we will calculate Information Gain for each independent column of dataset. For Outlook when we look to the column we found -

then the total entropy, information from this attribute and information gain from this attribute for sunny, overcast and rainy column for outlook column is :-

Similarly, when we calculate the information gain for other columns and the column with the highest information gain will be selected as the root node for making decision tree. So, as we calculate the information gain we will get :

Then we will start constructing our decision tree as →

Then, for the next step the column with outlook name will be considered as the root node and rest columns are considered as child nodes for the purpose of calculating information gain. NOTE — here the node overcast has already got it’s leaf node so, it should not be considered. We keep on continuing the same steps until we reach the leaf nodes and make the column parent node with highest information gain at each iterating steps before we end up having leaf nodes. Then after we reach the leaf nodes we will get our decision tree as —

So , this is how a decision tree is being constructed from scratch. We must always keep this key thing in mind that, if there is a high non -linearity and complex relationship between the dependent and independent variables then decision tree is best suited in that condition.

I hope you must have got some intuition about how the decision trees are being constructed in machine learning for various problems. So, if you have any doubts regarding this please let me know in the comment section. Until then LEARN, EAT, SLEEP, REPEAT.

credits → Eudreka , you tube video on Decision Trees

A Deep Dive into Decision Trees

Important terminologies used in decision tree making →

CART Algorithm : →

Written by Soumo Chatterjee