Decision Trees Classification

Published in

Analytics Vidhya

4 min readJun 4, 2020

Decision tree is one of the most popular machine learning algorithms. It is basically tree like structure constructed on the basis of attributes/features . Decision Trees is the non-parametric supervised learning approach.

Decision tree builds classification or regression models in the form of a tree structure.It breaks down a data set into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.

The final result is a tree with decision nodes and leaf nodes , leaf nodes represents a decision or classification.The topmost decision node in a tree which corresponds to the best predictor called root node.

Decision trees can handle both categorical and numerical data.

There are different types of algorithms there to build a decision tree such as

ID3 : Iterative Dichotomiser 3 uses Entropy function and Information gain as metrics.
CART : Classification And Regression Tree uses Gini Index(Classification) as metric.
CHAID : Chi-square automatic interaction detection Performs multi-level splits when computing classification trees
MARS : Multivariate adaptive regression splines

Decision Tree Algorithm

ID3 (Iterative Dichotomiser 3) uses Entropy function and Information gain as metrics.

Entropy function

ID3 algorithm basically says that the first step is to choose the right attribute for the splitting of the decision tree which feature should be choosen as a node.

For selecting the feature or attribute for splitting the decision tree we use the Entropy function. Entropy helps us to measure the purity of the split. In order to get the leaf node quickly we have to select the write feature.

In case the Entropy value comes out to be 1 then it the worst split that means that the split is completely impure

Now let’s create decision tree of one of the famous data which is whether dataset(playing game Y or N based on whether condition).

Outlook , temperture , humidity and windy are the predictors or you can say the independent attribute .
Play Golf is the Target or the dependent attribute

Information Gain

The information gain is based on the decrease in entropy after a dataset is split on an attribute.

To create a tree, we need to have a root node first and we know that nodes are features/attributes(outlook,temp,humidity and windy),

Firstly we have to calculate the Entropy of Play Golf .

Then find the Entropy of all the features/attributes .

The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.

Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.

So ,Outlook is the best predictor as it has the largest information gain which means that it is root node of the decision tree. As the value of entropy of overcast is 0 then the leaf node for overcast will be Yes

Repeat the same thing for sub-trees till we get the tree.

Finally we get the tree something like this :

Advantage

A significant advantage of a decision tree is that it forces the consideration of all possible outcomes of a decision and traces each path to a conclusion. It creates a comprehensive analysis of the consequences along each branch and identifies decision nodes that need further analysis.

That’s all for Decision Tree Machine learning Algorithm. Stay tuned for further blogs.

Thankyou

Implementation of Decision tree algorithm on Titanic dataset ~

Dataset : Titanic dataset

link: https://github.com/InternityFoundation/MachineLearning_Navu4/blob/master/Day%208%20:%20Decision%20Tree/Decision_tree_on_titanic_dataset.ipynb

Decision Trees Classification

Decision Tree Algorithm

Entropy function

Information Gain

Advantage

Written by Navjot Singh