Evaluation Metrics Part 3

ROC Curve and AUC score Explained and Implemented!!!

Siladittya Manna
The Owl
5 min readJun 21, 2020

--

In Part 1 and Part 2 of the Evaluation Metrics series, we have come across several metrics, except one, AUC score which is calculated by taking the Area Under the ROC curve.

Without further delay, let us dive into the conceptual ascept of ROC curve and AUC score.

ROC Curve

Receiver Operating Characteristics (ROC) curve is a graph showing the performance of a classification model at all thresholds. This curve is obtained by plotting two parameters, TPR (y-axis) and FPR (x-axis).

Source

The red dashed line represents a random classifier, which assigns 1 and 0 randomly to the samples. As you can see in the image above, the classifier becomes better as the ROC curve moves away from the red dashed line.

Lowering the threshold, increases the number of True Positives and False Positives, and vice-versa, because lowering the threshold causes more number of samples, which are originally labeled positive but with low predicted probability, gets re-classified as Positive (True Positive increases), and samples which are originally labeled negative but with high predicted probability also gets re-classified as Positive (False Positive increases). As we further lower the threshold, the value of True Positive Rate and False positive Rate approaches 1, hence we move towards the right and upwards along the curve. As the threshold is increased the True Negatives and False Negatives increases, and hence we move towards the left and downwards along the curve.

AUC score

Area Under Curve (AUC) score represents the degree or measure of separability. A model with higher AUC is better at predicting True Positives and True Negatives. AUC score measures the total area underneath the ROC curve. AUC is scale invariant and also threshold invariant. In probability terms, AUC score is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one.

Let X be a sample and T be the threshold. Then if X>T then it is is predicted positive, otherwise negative. Now, if X originally belongs to the positive class, then it follows the distribution p(x), otherwise n(x).

Source

From the above diagram, hopeully it is clear that

Now, to get the ROC curve we have to plot TPR(T) vs. FPR(T).

FPR(T) maps the threshold value T to a value on the x-axis and TPR(T) maps the threshold value T to a value on the y-axis. These two mappings can be represented as,

Now, the AUC score can be obtained by integrating TPR(T) over FPR(T)

The third and fourth step is obtained by

The fifth step follows from Leibnitz Integral Rule

How, did we get the last term?

The double integration evaluates as

Now, the term within the bracket is the cumulative sum of the probabilties, whose max value is 1.0.

The values shown are just for illustration. Source

Now, imagine the CDF on the right of the above image, superimposed on the PDF of p(x).

CDF superimposed on PDF p(X)

Now, from the above image is evident that the integration that is left to calculate, will be greater in magnitude if the CDF curve shifts more towards the left of the PDF p(x), because at any point x,

Now, the CDFs will be more like the green curve, if the overlap between p(x) and n(x) is less, that is, n(x) moves more towards the left of p(x), which ultimately means, less False Positives and False Negatives, that is, for low value of FPR we will have high value of TPR (ROC curve will approach closer to ideal form). If the overlap between p(x) and n(x) is more, the ROC curve approaches the curve for random guessing.

Going back to the integration,

So, the final step, is just like calculating a weighted cumulative sum over p(x), where the probability of a positive instance being detected as positive is scaled down by the cumulative probability of the instance being negative.

The above statement can be interpreted in other way as, more the overlap between the two distributions p(x) and n(x), less will be the confidence of the model to predict a positive instance as positive.

If the PDF n(x) overlaps less then the CDF of n(x) will have high value (close to 1) over the domain of p(x) (green CDF curve), hence the value of the terms inside the integrand will be higher, which causes the probability of a positive instance being predicted positive to become higher. On the other side, if the overlap is more between p(x) nad n(x), then the CDF of n(x) will approach the value 1.0 much later than the green CDF curve (say for example, more like the red CDF curve).

Thus at any instance T’

Finally,

For better understanding of ROC curve and AUC score, refer to this post.

For multiple classes, the ROC curve can be obtained by the following piece of code

Clap if you find it helpful and encourage us to post more articles like this!!!

Check out the Part 4 of this series on measuring uncertainty in metrics and how to combine all these metrics into one place for multi-class classification problems.

--

--

Siladittya Manna
The Owl

Senior Research Fellow @ CVPR Unit, Indian Statistical Institute, Kolkata || Research Interest : Computer Vision, SSL, MIA. || https://sadimanna.github.io