Model Evaluation Metrics — Gini Coefficient

About Gini Coefficient, Deriving it with AUC score and its advantages and disadvantages.

Tarun_KS
4 min readJun 25, 2023
Photo by Louis Hansel on Unsplash

Gini coefficient, commonly known as Gini, is a metric widely used to evaluate classification models. The Gini coefficient is a widely recognized measure of inequality and is commonly used in the finance industry to evaluate the performance of credit risk models. It comprehensively assesses the discriminatory power (i.e., how well a model distinguishes between defaulting and non-defaulting borrowers) of a model by quantifying the inequality in the predicted probabilities of default.

  • The Gini coefficient ranges from 0 to 1, with 0 representing perfect equality (no discrimination) and 1 representing perfect inequality (perfect discrimination).

In the context of credit risk modeling, a higher Gini coefficient indicates better model performance in terms of its ability to accurately rank borrowers based on their creditworthiness.

Ways to compute Gini coefficient:

  1. Constructing the Lorenz curve and extract Corrado Gini’s measure to get the Gini coefficient.
  2. Constructing the ROC curve to extract the AUC and then compute the Gini coefficient.

The Gini coefficient derived from the Lorenz curve measures inequality in a distribution. It is based on the cumulative distribution of the variable and compares it to a hypothetical situation of perfect equality. The calculation involves calculating the area between the Lorenz curve and the line of perfect equality.

On the other hand, the Gini coefficient derived from the AUC score measures the discriminatory power of a classification model. It is based on the model’s ability to distinguish between positive and negative instances, as represented by the ROC curve. The calculation involves calculating the area under the ROC curve and applying a formula to convert it to the Gini coefficient.

It’s important to note that this derivation of Gini coefficient from AUC is specific to classification models and evaluates their discriminatory power. It should not be confused with the traditional derivation of Gini coefficient from the Lorenz curve, which is used to measure inequality in a distribution.

In this article, we’ll focus on the second method of computation as we’re going to evaluate the performance of a classification model.

Computing the Gini Coefficient from AUC

The ROC (Receiver operating characteristic) curve is the plot of False Positive rate (FPR) on x-axis and True positive rate (TPR) on y-axis, across various thresholds. AUC is the area under this ROC curve.

The Gini coefficient is derived from AUC value using the formula mentioned below:

Gini = 2*AUC -1

class ModelEvaluation:

"""
A class to compute the auc score & gini of the model for the given predictions.

Attributes
----------
predictions : pd.DataFrame
A DataFrame containing 'Probability_Default' & 'DV' columns.

PD_column : string
Name of PD column in the dataframe.

label_column: string
Name of the DV column.

Methods
-------

plot_roc_curve(): Plots the ROC Curve
compute_auc_gini(): Returns the AUC and Gini coefficient


"""

def __init__(self, predictions_df, PD_column, label_column):
self.predictions = predictions_df
self.PD_column = PD_column
self.label_column = label_column

def plot_roc_curve(self):
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
# Compute FPR, TPR, and thresholds
fpr, tpr, thresholds = roc_curve(y_true = self.predictions[self.label_column], y_score = self.predictions[self.PD_column])
# Compute AUC
roc_auc = auc(fpr,tpr)
# Plot the ROC curve
plt.figure()
plt.plot(fpr, tpr, label='ROC curve (AUC = %0.5f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'r--') # Plotting the diagonal line (random classifier)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

def compute_auc_gini(self):
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_score = self.predictions[self.PD_column],y_true = self.predictions[self.label_column])
gini = 2*auc - 1
return auc,gini
Sample Predictions Dataset
model_eval = ModelEvaluation(results_df,PD_column='Prediction_Probability',label_column='Bads')
auc,gini = model_eval.compute_auc_gini()
ROC Curve

Gini = 2*0.73043–1 = 0.46086

Gini and AUC value obtained for the above sample dataset.

Summary:

  • The Gini coefficient has an intuitive interpretation. It represents the degree of separation between positive and negative classes, making it easier to understand and communicate.
  • The Gini coefficient provides a single summary measure of the model’s discriminative power, capturing the performance across all possible classification thresholds.
  • The Gini Coefficient is insensitive to calibration. Gini does not consider the calibration of PDs, focusing solely on the ranking. This means that models with poorly calibrated probabilities can still achieve high Gini scores.
  • The Gini coefficient does not provide insights into the underlying factors driving the predictions.

The Gini coefficient is just one of the several metrics used to evaluate the credit risk models in finance industry and a comprehensive evaluation process should incorporate other relevant metrics to ensure robust and accurate model performance assessment.

Thanks for reading the blog. If there are any questions or if you want to share any valuable feedback, feel free to reach out to me on Linkedin.

--

--