Precision/Recall Tradeoff

Small potato
3 min readJul 13, 2020

--

In machine learning, we usually use the term called “confusion matrix” to measure the performance of classifier.

The confusion matrix can provide the information of true positive, true negative, false positive, false negative, but sometimes we may get confused.

So why don’t we have a more concise metric?

1. The accuracy of the positive predictions

precision = TP/(TP + FP)

2. The sensitivity or true positive rate

recall = TP/(TP + FN)

An illustrated confusion matrix can help us to understand more.

Suppose the classifier is to detect the cats from the dogs.

So,

TP (True positive) means that the cats are detected as “cat” by the classifier, i.e. “the instances are correctly classified”.

FN (False negative) means that the cats are not detected as “cat” by the classifier, i.e. “the instances are wrongly classified”.

FP (False positive) means that the dogs are detected as “cat” by the classifier, i.e. “the instances are wrongly classified”.

TN (True negative) means that the dogs are detected as “non-cat” by the classifier, i.e. “the instances are correctly classified”.

An illustrated confusion matrix

Unfortunately, you will be in the dilemma: increasing precision reduces recall, and vice versa. This is called the precision/recall tradeoff.

1. Suppose the decision threshold is positioned at the right arrow which label “1”, we can see that all three cats are detected as “cat”, i.e. all the instances are correctly classified. So the precision would be 100%.

However, there are 3 cats are not detected as “cat” by the classifier, i.e. “the 3 instances are not correctly classified”. So the recall would be 3/6 = 50%

2. Now if we reduce the threshold (move the arrow to the left), one dog instance originally detected as “non-cat” which will be detected as “cat” now, i.e. TN would become FP. So the precision would thus be reduced (4/5 = 80%).

Meanwhile, one more cat instance is detected as “cat” by the classifier. So the recall would be 4/6 = 67%.

3. Now if we further reduce the threshold (move the arrow to the left). 6 cats and 2 dogs are detected as “cat”. Now the precision would thus be reduced (6/8 = 75%).

At the same time, all cat instances are detected as “cat” by the classifier. So the recall would be 6/6 = 100%.

We can plot precision and recall as functions of the threshold value like the following graph.

Precision and recall versus the decision threshold

Another way to select a good precision/recall tradeoff is to plot precision directly against recall, as shown below.

To conclude, a high-precision classifier may not very useful when its recall is too low!

--

--