Gini Impurity and Entropy

Jason Wilson
1 min readOct 23, 2018

--

Gini impurity is used by the CART (classification and regression tree) algorithm for classification trees. Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. As shown in the below graph, the highest Gini score is 0.50 while the highest entropy score possible is 1.0.

Gini impurity measures how heterogeneous or mixed some value is over a set. In decision trees, we find criteria that make a set into more homogenous subsets. Information gain is often used to describe this difference that’s being maximized though that term goes with entropy. But it’s really entropy that is the alternative to Gini impurity. When evaluating gini impurity and entropy, it’s important to remember that the higher the score the higher the impurity and therefore the higher the information gain will be when a split occurs.

--

--