This post will make you understand the use of entropy in a decision tree. Let’s get right into it…

A real-world example of high entropy. A messed up room…

What is entropy??
In layman terms, entropy is just a measure of randomness. We always want to minimize the randomness because there will be a lot of uncertainty if the randomness is more.
Explanation with a funny example...

This is the data I have collected about my friend on girls he liked. They are four features I used.
Look: There are three types (bad, okay, good) this describes how she looks for him, not for others.
Makeup: It’s a boolean attribute whether she uses to make or not
Simple dress: It’s a boolean attribute whether she wears a simple dress or not
Height: How tall she is on a centimeter-scale.
What we should do using the above data???
Our goal is to use any one of the features and predict whether he will like a girl or not. Let’s use the Simple dress as a feature.

The total entropy of a split it a sum of the entropy of its right node and left node;
The formula of entropy of a split

Formula description:
S = Entropy, P+(probability of liked), (p-) probability of (disliked)
The entropy of left = -(0)*log2(0)- (1/1)*log2(1) which is equal to zero.This condition is a pure node or leaf node. The above condition is what we want.
The entropy of right= -(2/5)*log2(2/5)-(3/5)*log2(3/5) which is equal to 0.962 which is not good.
Important points
- We always want entropy to be as less as possible.
- The attribute which has low entropy has a higher priority than others. This will also have higher information gain
Write down in the comments, which feature you will choose for determining whether your friend will like a girl or not.

