Decision Trees

Shubham Saket
Geek Culture
Published in
3 min readMay 22, 2021

How a tree decides?

What is a decision tree?

“A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.” — Wikipedia

Entropy: It is the amount of information needed to accurately describe a sample. Its value lies in between 0 (homogenous) and 1 (heterogenous).

Here p is the probability of the category.

Entropy calculation:

Entropy for R set can be calculated as below:

Our Dataset:

Above figure is an example dataset with 3 features and a label.

Aim is to create a tree with decision making conditions that classifies the data into different classes.

Applying concept of entropy on our dataset to calculate current entropy in the dataset (using Labels).

Let us split the data according to Feature 2 :

Similarly, Information Gain (IG) is calculated with help of other features and we choose the split with highest IG. Once the split is finalized, algorithm again tries to split E1 and E2 and so on. This ensures the entropy is least at the end of the tree.

GINI Index:

As an alternative to entropy GINI index can also be used to create Decision Trees. It is defined as

p signifies probability of the category.

Note :- Decision Tree algorithm can also be used as a regressor. When we try to solve a regression problem we use MSE(Mean Squared error), MAE (Mean Absolute Error), Poisson ( Poisson Deviance) instead of entropy.

Decision Trees are prone to overfit. To avoid such a scenario early stopping or parameters like max depth etc. can used.

I hope this post helped you in understanding how decision tree algorithm works.

That’s all.

--

--