Decision Tree Entropy|Entropy Calculation

Aditya Kumar Pandey
The Startup
Published in
3 min readAug 13, 2020


A decision tree is a very important supervised learning technique. It is basically a classification problem. It is a tree-shaped diagram that is used to represent the course of action. It contains the nodes and leaf nodes. it uses these nodes and leaf nodes to draw the conclusion. Here we are going to talk about the entropy in the decision tree. Let’s have a look at what we are going to learn about the decision tree entropy.

  • What is Entropy?
  • Importance of entropy.
  • How to calculate entropy?

What is Entropy?

So let’s start with the definition of entropy. What is this entropy?

“The entropy of a decision tree measures the purity of the splits.”

Now let us understand the theory of this one-line definition. Let’s suppose that we have some attributes or features. Now between these features, you have to decides that which features you should use as the main node that is a parent node to start splitting your data. So for deciding which features you should use to split your tree we use the concept called entropy.

Importance of Entropy

  1. It measures the impurity and disorder.
  2. It is very helpful in decision tree to make decisions.
  3. It helps to predict, which node is to split first on the basis of entropy values.

How to calculate Entropy?

Let’s first look at the formulas for calculating Entropy.

decision tree entropy

Here, p is the Probability of positive class and q is the Probability of negative class.

Now low let’s understand this formula with the help of an example. consider some features. Let’s say E1, E2, E3 are some features. we need to make a tree using one of the appropriate features as the parent node. let’s suppose that E2 is the parent node and E1, E3 are leaf node. Now when we construct a decision tree by considering E2 as parent node then it will look like as shown below.

I have considered the E2 as a parent node which has 5 positive input and 2 negatives input. The E2 has been split into two leaf nodes (step 2). After the spilt, the data has divided in such a way that E1 contains 2 positive and1 negative and E3 contains 3 positive and 1 negative. Now in the next step, the entropy has been calculated for both the leaf E1 and E2 in order to find out that which one is to consider for next split. The node which has higher entropy value will be considered for the next split. The dashed line shows the further splits, meaning that the tree can be split with more leaf nodes.

NOTE 1: The leaf nodes which is having greater etnropy value will be consider for further splitting.

NOTE 2: The value of entropy is always between 0 to 1.

So this was all about with respect to one node only. You should also know that for further splitting we required some more attribute to reach the leaf node. For this, there is a new concept called information gain.

Worst Case:- If you are getting 50% of data as positive and 50% of the data as negative after the splitting, in that case the entropy value will be 1 and that will be considered as the worst case.

If you like this post then please drop the comments and also share this post. Click here to get more knowledge about the machine learning algorithms.