Why You Might NOT Want to Use Area Under PR Curve

3 min readSep 1, 2019

Last time I introduced Mr. Potato and his approach to his tiny donut problem. It looks like Area Under Curve is a better evaluation metric for his unbalanced class problem because such scalar comparison is insensitive to changes in the decision threshold.

However, there are still intrinsic problems with this metric. I present 2 models: model A with Area Under Curve equal to 0.8 and model B with Area Under Curve equal to 0.75. From a scalar comparison point of view it is clear and almost without randomness that model A is superior. Let's look into the possible precision recall curves (PR curve) of model A and B just based off of these two numbers:

Figure 1:

Figure 2:

Figure 3:

In figure 1, it is clear that model A is superior in terms of precision and recall with any threshold, so given the existing features, model B can be discarded. On the other hand, figure 2 and 3 does not offer that clear an insight. Maybe recall is more important than precision for Mr. Potato, but he would also need to avoid a terrible precision - a 10% precision obviously sounds likes a dumb model.

Or consider a more extreme situation as shown in figure 4:

In this case, model A is behaving in a strange manner. This could occur when clusters of true positives fall very close to false positives, which causes the model to have 61% confidence that true positives are actually 1's but also 60% confidence that false positives should be predicted as 1's. This behaviour in the PR curve will also likely be accompanied with low generalizability over to the test set, a little variation in the negative examples from the test set might cause the model to falsely categorize them as positive.

Furthermore, if you are going after a minimum precision or recall score for business/safety reasons with your specific application, for example you might want to ensure a 95% precision for a spam filter, the scalar comparisons of Area Under Curve may not be optimal to draw conclusions from.

Why You Might NOT Want to Use Area Under PR Curve

Written by Quinn Wang