Machine Learning: Decision Tree Classification

Gaurav Parihar
Analytics Vidhya
Published in
4 min readJun 13, 2020

Most of us feel Decision Tree to be tough but, it’s one of the most powerful techniques in Machine Learning. It’s easy to implement and falls under Supervised Learning Techniques. As the name suggests it’s a tree but, not with actual root and green leaves. It is tree build by going through various conditions forming a root called the Head Node and the leaves called Terminal Leaves resulting to break down the data into small segments.

Decision Tree:

Decision Tree builds classification or regression models in the form of a tree structure. It is Supervised Learning technique. Their are two types of Decision Tree:-

Classification Tree.

Regression Tree.

Decision Tree are build using Recursive Portioning. This approach is also called Divide and Conquer. It breakdown a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.

Decision Tree is flowchart like structure:

Description about Decision Tree Classification structure:

i) It is a flowchart like Tree structure.

ii) Internal Nodes represents features.

iii) Branch represents decision rule.

iv) The top most node is called Root Node.

v) Leaf Nodes represents the final decision or result in Binary(0, 1) describing 0 for not happening the event and 1 for happening the event.

This final structure helps in decision making.

But, the first thing which comes in our mind is how the tree formation takes place and procedure actually does it follow?

So, Let’s talk about Decision Tree Algorithm-

Decision Tree Model Preparation.

Now, let’s discuss about each feature selection step in details:-

Information Gain:-

Information gain is the decrease in entropy. Information gain computes the difference between entropy before split and average entropy after split of the dataset based on given attribute values.

Mathematical Implementation-

Where, Pi is the probability that an arbitrary tuple in D belongs to class Ci.

The attribute A with the highest information gain, Gain(A), is chosen as the splitting attribute at node N().

Gain Ratio-

Information gain is biased for the attribute with many outcomes. It means it prefers the attribute with a large number of distinct values.

The attribute with the highest gain ratio is chosen as the splitting attribute.

Gini Index-

The Gini Index considers a binary split for each attribute. You can compute a weighted sum of the impurity of each partition. If a binary split on attribute A partitions data D into D1 and D2, the Gini index of D is:

Where, pi is the probability that a tuple in D belongs to class Ci.

The attribute with minimum Gini index is chosen as the splitting attribute.

Let’s look One Example-

Conclusion-

Decision trees are easy to interpret and visualize. It can easily capture Non-linear patterns. But, sensitive to noisy data. It can overfit noisy data. Decision trees are biased with imbalance dataset, so it is recommended that balance out the dataset before creating the decision tree. The small variation(or variance) in data can result in the different decision tree. This can be reduced by bagging and boosting algorithms.

Thank You for giving your valuable time to read this blog!!!

--

--

Gaurav Parihar
Analytics Vidhya

Machine Learning Enthusiast, Student of Computer Science, Jaypee Institute of Information Technology.