What is meant by entropy in a decision tree??

Deeraj
Deeraj
Nov 4 · 3 min read

This post will make you understand the use of entropy in a decision tree. Let’s get right into it…

A real-world example of high entropy. A messed up room…

what’s your room entropy??

What is entropy??

In layman terms, entropy is just a measure of randomness. We always want to minimize the randomness because there will be a lot of uncertainty if the randomness is more.

Explanation with a funny example...

Data from my friend about girls he liked

This is the data I have collected about my friend on girls he liked. They are four features I used.

Look: There are three types (bad, okay, good) this describes how she looks for him, not for others.

Makeup: It’s a boolean attribute whether she uses to make or not

Simple dress: It’s a boolean attribute whether she wears a simple dress or not

Height: How tall she is on a centimeter-scale.

hat we should do using the above data???

Our goal is to use any one of the features and predict whether he will like a girl or not. Let’s use the Simple dress as a feature.

Data flow graph on the decision of a tree

The total entropy of a split it a sum of the entropy of its right node and left node;

The formula of entropy of a split

The formula of entropy.

Formula description:

S = Entropy, P+(probability of liked), (p-) probability of (disliked)

The entropy of left = -(0)*log2(0)- (1/1)*log2(1) which is equal to zero.This condition is a pure node or leaf node. The above condition is what we want.

The entropy of right= -(2/5)*log2(2/5)-(3/5)*log2(3/5) which is equal to 0.962 which is not good.

Important points

  1. We always want entropy to be as less as possible.
  2. The attribute which has low entropy has a higher priority than others. This will also have higher information gain

Write down in the comments, which feature you will choose for determining whether your friend will like a girl or not.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Deeraj

Written by

Deeraj

Machine learning | full stack developer | tech-savvy.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade