Decision Trees

6 min readJan 29, 2023

Part 3: Gain Ratio

Gain Ratio is an alternative to Information Gain that is used to select the attribute for splitting in a decision tree. It is used to overcome the problem of bias towards the attribute with many outcomes. For example, Suppose we have two features, “Color” and “Size” and we want to build a decision tree to predict the type of fruit based on these two features. The “Color” feature has three outcomes (red, green, yellow) and the “Size” feature has two outcomes (small, large). Using the information gain method, the “Color” feature would be chosen as the best feature to split on because it has the highest information gain. However, this could be a problem because the “Size” feature could be a better feature to split on because it is less ambiguous and has fewer outcomes.

To overcome this problem, we can use the gain ratio method. Gain Ratio is a measure that takes into account both the information gain and the number of outcomes of a feature to determine the best feature to split on. We use Gain Ratio to normalize the Information Gain by the Split Info.

Where D(i) is the probability of each outcome i in the attribute

Note — The lower the value of Split Info, the better the split is considered to be.

let’s consider a toy dataset with 3 attributes: “Temperature”, “Humidity” and “Wind”. The target feature is “Play Tennis” which can have two outcomes: “Yes” or “No”.

The dataset looks like this:

To calculate the Split Info for the attribute “Temperature”, we first find the total number of instances in the dataset, which is 5. Then we find the number of instances for each outcome of the “Temperature” attribute, which is:

Hot: 2 instances
Mild: 1 instance
Cool: 2 instances

Now we can use this information to calculate the Split Info using the formula:

Split Info(Temperature) = — (2/5) * log2(2/5) — (1/5) * log2(1/5) — (2/5) * log2(2/5)

which is approximately equal to 0.918

Similarly, we can calculate the Split Info for the other attributes “Humidity” and “Wind”.

Now The formula for gain ratio:

Gain Ratio = Information Gain / Split Info

Note — In decision tree algorithm the feature with the highest gain ratio is considered as the best feature for splitting.

Example:

Let’s consider a toy dataset that contains information about different types of fruits and their characteristics. The dataset has the following columns:

Fruit: The type of fruit (Apple, Banana, Orange)
Color: The color of the fruit (Red, Yellow, Orange)
Shape: The shape of the fruit (Round, Long, Square)
Weight: The weight of the fruit (in grams)
Bitten: Indicates whether the fruit is bitten or not (Yes, No)

To determine the best feature for splitting to build a decision tree using gain ratio, we can follow these steps:

1. Calculate the initial entropy of the target variable (Bitten) using the formula: Entropy = — p(Yes) log2 p(Yes) — p(No) log2 p(No)

where p(Yes) and p(No) are the probabilities of the Bitten variable being “Yes” or “No” respectively. From our dataset, we have:

p(Yes) = 3/10 = 0.3 (3 fruits are bitten),

p(No) = 7/10 = 0.7 (7 fruits are not bitten)

Entropy(Parent) = — (0.3) *log2 (0.3) — (0.7) *log2 (0.7) = 0.881

This is the entropy of the parent node from which we will calculate the information gain for each feature.

2. The information gain can be calculated by comparing the entropy of the parent node to the weighted average of the entropies of the child nodes.

Information Gain (feature) = Entropy(Parent) — Weighted Average of Entropies(Children)

where the weighted average of entropies of the children nodes is calculated by summing the product of the probability of each child node and its corresponding entropy.

Weighted Average of Entropies(Children)= Sum (p(Child) * Entropy(Child))

Now, For each feature (Color, Shape, Weight), calculate the information gain

Colour has 3 possible values (Red, Yellow, Orange)

Here is how we can calculate the entropy of the “Red” child node:

To calculate the entropy of the child node “Red”, we first need to calculate the probability of the target variable “Bitten” for the “Red” child node.

i) Count the number of fruits that are “Red” and have been “Bitten” and divide it by the total number of “Red” fruits. Let’s call this probability p(Yes|Red).

p(Yes|Red) = number of “Red” fruits that have been “Bitten” / total number of “Red” fruits = 1/2 = 0.5

ii) Count the number of fruits that are “Red” and have not been “Bitten” and divide it by the total number of “Red” fruits. Let’s call this probability p(No|Red).

p(No|Red) = number of “Red” fruits that have not been “Bitten” / total number of “Red” fruits = 1/2 = 0.5

iii) Use the above probabilities to calculate the entropy of the “Red” child node using the formula:

Entropy(Red) = — p(Yes|Red) *log2 p(Yes|Red) — p(No|Red) *log2 p(No|Red)

Entropy(Red) = — (0.5) *log2 (0.5) — (0.5) *log2 (0.5) = 1

So, the entropy of the “Red” child node is 1.

Similarly, you can calculate the entropy of other child nodes by counting the number of fruits that belong to that child node and have been “Bitten” or not and divide it by the total number of fruits that belong to that child node.

And then applying the above formula to get the entropy of each child node.

Entropy(Yellow) = — (2/4) *log2 (2/4) — (2/4)* log2 (2/4) = 1

Entropy(Orange) = — (1/4) *log2 (1/4) — (3/4) *log2 (3/4) = 0.811

Now, p(Red) = 2/10 = 0.2

p(Yellow) = 4/10 = 0.4

p(Orange) = 4/10 = 0.4

Weighted Average of Entropies(Children)= (0.2) * (1) + (0.4) * (1) + (0.4) * (0.811) = 0.922

Information Gain (Color) = 0.881–0.922 = -0.041

Similarly, we can calculate the information gain for Shape and Weight

3. For each feature, calculate the split info by summing the negative probabilities of the child nodes, multiplied by the logarithm of those probabilities.

Split Info (Color) = — Sum (p(Child) * log2 (p(Child)))

Split Info (Color) = — (0.2) *log2 (0.2) — (0.4) *log2 (0.4) — (0.4) *log2 (0.4) = 1.371

Similarly, we can calculate the split info for Shape and Weight

4. Divide the information gain by the split info for each feature to calculate the gain ratio.

Gain Ratio (Color) = Information Gain (Color) / Split Info (Color) = -0.041 / 1.371 = -0.03

Similarly, calculate Gain Ratio for Shape and Weight

5. Compare the gain ratios of each feature, the feature with the highest gain ratio is considered as the best feature for splitting.

Python Implementation:

Here is an example of how to calculate the gain ratio in a decision tree using the scikit-learn library in Python:

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create the decision tree classifier
clf = DecisionTreeClassifier(random_state=0, criterion='gini')
clf.fit(X, y)

# Get the feature importances (which include the gain ratio)
importances = clf.feature_importances_

gain_ratios = {}
# Print the gain ratio for each feature
for i, feature in enumerate(iris.feature_names):
    gain = importances[i]
    split = clf.tree_.impurity[i]
    gain_ratio = gain / (split + 1e-7)
    print(f'Gain ratio for {feature}: {gain_ratio:.3f}')

    gain_ratios[feature] = gain_ratio

# Find the feature with the highest gain ratio
best_feature = max(gain_ratios, key=gain_ratios.get)

# Print the best feature for splitting
print(f'Best feature for splitting: {best_feature}')

By default, the criterion used in scikit-learn’s DecisionTreeClassifier is “gini”. “Gain” in this context refers to the information gain and “Split” refers to the split information. The code adds small value 1e-7 to the split information to avoid division by zero.

The output of the above code would look something like this:

Gain ratio for sepal length (cm): 0.020
Gain ratio for sepal width (cm): 0.000
Gain ratio for petal length (cm): 1.128
Gain ratio for petal width (cm): 2.515
Best feature for splitting: petal width (cm)

In this case, the gain ratio for sepal width(cm) is 0.000 because the information gain is zero and the split information is also zero. It means that the feature is not providing any information to the decision tree. Also, the gain ratio for petal width (cm) is 2.515 which is relatively high and that feature is providing a lot of information to the decision tree.

Final Note: Thanks for reading! I hope you find this article informative.

Don’t forget to read: Decision Trees — Part 4

Decision Trees

Written by om pramod