ROC Curve, AUC value — Significance of thresholds and what do they really mean on the model performance?

Vasanthkumar Velayudham |
Analytics Vidhya
Published in
6 min readMar 3, 2020

Every data scientists/ data science aspirants would have come across the concepts of ROC (Receiver Operating Characteristics) curve and AUC (Area Under Curve) and its applicability in evaluating the model quality.

Understanding is ideal src: https://agurlblog.wordpress.com/2016/06/12/understanding-is-ideal/

There are numerous blogs and tutorials which explain about them in detail. But I always had questions about what its relevance to model quality and how to use them in choosing the right model. They are not as straightforward as Accuracy/F1-score and I always used to feel that I am missing the key purpose of it. In this article, I would like to address that concern and share with my fellow readers about what ROC Curve and AUC values are and how it needs to be used towards evaluating the models.

ROC Curve — Receiver Operating Characteristics

- Definition and Formula

There are numerous blogs that describe about the ROC curve, its formula and the theoretical science behind it. So I am not going to spend any time on this. Please refer the Reference section of the article for the useful references.

Sample Data

Lets consider the binary classification results of ‘Fraud Detection’ problem as our dataset and refer this link for the data and python notebook used in this article. We are not going to worry about what is the algorithm and how the prediction model is built. Assume prediction model is built and you have tested the model with test data, obtained the results.

We will work with the below data in calculating ROC and AUC metrics and will explain how significant is FPR (False Positive Rate), TPR (True Positive Rate) etc are.

Following are the summary of the data:

→ Results file contains around 1900 records

→ Actual value is present under the column ‘True_Class’

→ Predicted value of the binary classification is present under the column ‘Predicted_Class’.

→ Predicted probability value of the classification function is present under the column ‘prediction_probability’

We will be using these data for our ROC, AUC calculations in the rest of this article.

Significance of Predicted Probability:

Most of the classification algorithm determines whether the predicted value as true/false based on the defined probability threshold. Scikit-learn libraries consider the probability threshold as ‘0.5’ by default and makes the predictions as true when its value is greater than 0.5 and false when the value is lesser.

But it is always NOT necessary to consider prediction probability as 0.5. Threshold value could be set at any value which helps us in classifying the results better. Especially for the problems related with highly imbalanced data or anomaly detection — playing with the threshold value will help you improve the accuracy.

Confusion Matrix, FPR, TPR Calculations:

Lets calculate the FPR, TPR and Confusion Matrix values for the provided results with the threshold value of 0.5 and this is how the results look like::

As you observe, accuracy of this prediction is around 79.2%, considering the probability threshold value of 0.5 for the true class. TP, FP, TN and FN values are 485, 286, 1043 and 115 respectively — as displayed in confusion matrix.

Lets calculate the FPR and TPR for the above results (for the threshold value of 0.5):

TPR = TP/(TP+FN) = 485/(485+115) = 0.80

FPR = FP/(TN+FP) = 286/(1043+286) = 0.21

Lets redo the same exercise with the different threshold value — say 0.6 and observe how the Confusion Matrix, TPR and FPR value changes.

As you observe, accuracy of this prediction has decreased to 79.2%, for the probability threshold value of 0.6 for the true class. TP, FP, TN and FN values are 677, 94, 307 and 851 respectively — as displayed in confusion matrix.

Lets calculate the FPR and TPR for the above results (for the threshold value of 0.6) and there is not hardly any change:

TPR = TP/(TP+FN) = 677/(677+307) = 0.68

FPR = FP/(TN+FP) = 307/(307+851) = 0.26

So, ROC curve is the curve where we plot TPR and FPR values of the results against different threshold values.

Scikit learn library does a wonderful job in coming up with different threshold values and simplifies the FPR, TPR calculation for us.

For the dataset, we worked upon — ROC curve looks as below:

In general, ideal value of ROC curve is (0,1) and from the plot, we need to identify the ‘TPR’/’FPR’ values closer to the point (0,1) and can determine the respective ‘Threshold’ value as a ‘Probability Threshold’. In the above case, black circled point is the one which is closer to the ideal value and hence the threshold value corresponding to that point is the best probability threshold for this case.

As summarized, ROC curve helps you in determining the right threshold value for your problem considering the variations of FPR and TPR values.

Threshold selection for the given problem is based on the trade-off between False Positives and False Negative values. Say for SMOG prediction system — having a system with lesser False Negative (Predict no smog, when there exists a smog) is more preferred at the cost of greater False Positives (Predict smog, when there is no smog). As having higher false negative, will have an health impact on the people and it is usually better to be prepared for poorer weather, when it is not. So, data scientist has to determine the best threshold value considering the problem’s nature.

AUC — Area Under Curve

As we speak about ROC, its discussion never ends without the mention of AUC.

As you would have observed ROC is primarily about determining the best probability threshold value for the given ‘model’ results. It calculates FPR and TPR for varying threshold of the provided results by one single model.

Consider you have multiple models providing you with different results and you would be required to identify the best performing model from the given lot. This is where you would employ AUC.

AUC provides summary of how good is your model performance as a whole and it provides the quality score describing its overall performance. Higher the AUC value, better the model.

If you have AUC of multiple models with you — then you can determine which model is best one by comparing the AUC value.

By itself, it does not have any significance. Whereas — say you have the prediction results from 2 models — one with the value of 0.96 and other with 0.88, then you could determine the model having higher AUC is better for your data.

References:

--

--