Classification Model Performance Evaluation (Image courtesy: https://bit.ly/36b7Lf2)

Classification Model Performance Evaluation using AUC-ROC and CAP Curves

Rushikanjaria
Geek Culture
Published in
6 min readJul 5, 2021

--

Performance Measurement is an essential task for any machine learning project, it is very important to check how good or bad our model is. We use R squared (R²) and Root mean squared error (RMSE) when it comes to regression models. In case of classification models, we can rely on an AUC-ROC curve or CAP curve, when we need to evaluate or illustrate the performance of a multi-class classification issue.

What is AUC-ROC curve?
AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve, also known as AUROC (Area Under the Receiver Operating Characteristics) is a performance measurement for classification problems at various threshold levels. It is one of the most significant evaluation measures for assessing the performance of binary classification problems.
ROC is a probability curve that plots the TPR (True Positive Rate) against FPR (False Positive Rate). AUC is the measure of separability, it shows how much our model is capable to distinguish between classes.
The AUC indicates how well the model distinguishes between positive and negative classes. The greater the AUC, the better.

AUC-ROC Curve

What is CAP curve?
A CAP (Cumulative Accuracy Profile) curve is a performance measurement for classification problems. It is used to evaluate a model by comparing the current curve to both the ‘perfect’ or ideal’ curve and a ‘randomized’ curve.
A decent or good model will have a CAP that is in the middle of the perfect and random curves. The closer a model is to the perfect CAP, the better is.

CAP Curve

There are two methods to analyze the performance using CAP curve:

  • Area Under Curve: We calculate the Accuracy Rate(AR) by calculating area under the perfect model and the random model (aP), and calculating area under the prediction model and random model (aR).
    Accuracy Rate (AR) = aR / aP
    Higher the AR, that is closer to the 1, better is the model.
Area under the perfect model and random model (aP)
Area under the good model and random model (aR)
  • Plot: We draw a vertical line at 50% from x-axis till it intersects the ‘good model’ line, from that intersection point we draw a horizontal line till y-axis. now this point which cuts y-axis is the percentage of how many positive outcomes you are going to identify if you take 50% of the population.
Plot method to assess the performance

Just by looking at this plot, you can assess the performance of the model based on X% value.

  1. X < 60% → Rubbish
  2. 60% < X < 70% → Poor/Average
  3. 70% < X <80% → Good
  4. 80% < X < 90% → Very Good
  5. 90% < X < 100% → Too Good ( In this case you should be very careful with the chances of overfitting )

Now I will show you how you can use AUCROC curve and CAP curve to evaluate your classification model using python.

Dataset
I am using the Social_Network_Ads dataset, in which I have used only three features, ‘Age’, ‘EstimatedSalary’ and ‘Purchased’. The output labels are ‘0’ and ‘1’ which represent whether a person a purchased the product or not.
0 → not purchased
1 → purchased
Our goal is to predict if any person will buy the product or not.

Complete Dataset

The ‘Red’ points represent people who have not purchased the product and ‘Green’ points represents people who have purchased the product.

Classification
I have split the dataset into two set, 75% training data and 25% testing data. I used the Logistic Regression to train and test the model. The model achieved a accuracy score of 89%

Classification on test set

Performance Evaluation:

  • AUROC :
    Import roc_curve and auc from sklearn.metrics to create the ROC curve and also to calculate the Area Under Curve.
    First, calculate the probabilities of prediction using predict_proba. It will return a numpy array with two columns, the first column consists of probabilities of class 0 and the second column consists probabilities of class 1. As we have to measure how well our model distinguish between positive and negative class, so I have used probability of class 1 to plot the ROC curve.

The roc_curve will generate ROC curve and returns fpr, tpr and threshold. We need fpr and tpr to calculate the area under the curve of this model, which now we have as returned by roc_curve. So, we can use the value of fpr and tpr as input for auc function to calculate the area under the curve.
Now we plot the ROC curve and analyze the performance of our model.

AUC → 95%

The area under the curve is 0.95 → 95%, which is incredible and indicates that our model is doing well.

  • CAP:
    First, calculate the total data points in the test data (100). Then calculate the number of data points of class 1 in the test data (32) and also calculate the number of data points of class 0 in the test data (68).

Now we start plotting our CAP curve. To begin, we create a random model based on the assumption that the correct detection of class 1 will increase linearly.
Next we plot the perfect model. A perfect model is one that detects all class 1 data points in the same number of trials as the number of data points in the class. The perfect model requires exactly 32 trials to detect 32 class 1 data points.
Now, finally we plot the results from the Logistic Regression. As in AUROC curve, we have to get the probability of class 1 and merge these values with y_test using zip function.
To calculate the y_ values we use np.cumsum(). np.cumsum() produces an array of values by adding all of the array’s previous values to the current value. For instance, consider the array [1, 1, 1, 1, 1]. [1, 2, 3, 4, 5] would be the outcome of np.cumsum() function. In addition, we must append 0 to the array for the at the start point (0,0). The x_values will be in the range of 0 to total + 1.

CAP curve

Now, as we have two methods to analyze the performance using CAP method:

  • Area under curve: First we have to calculate all areas using auc function and then use these values to calculate the Accuracy Rate. The rate is at 0.90, which is quite near to one, indicating that our model is really effective.
  • Plot: First, we have to find the index of 50% of total test data. Then we have to plot a vertical line till it intersect our trained model line and from that intersecting point, draw a horizontal line till it intersect y-axis. Then calculate the X% value by dividing the observed class 1 values by the total number of class 1 data points and multiplying by 100. We get 96.875%
CAP curve using plot method

CAP curve gave us 96.875% which implies that our model is too good. It shows us that our model is too good to classify positive and negative classes.

Conclusion
The article shares brief understanding about performance measures and methods you can use to analyze the performance of your classification model. You can use any of the above mentioned methods to measure the performance of your model.

You can find the entire code starting from implementing Logistic Regression to Performance Evaluation in my github repository:

References

Thank You for Reading this article!, hope you all have learned something new today.

--

--

Rushikanjaria
Geek Culture

A Machine Learning enthusiast and passionate data scientist