Problems with Precision and Recall.

Published in

Analytics Vidhya

5 min readNov 15, 2020

If you’ve been learning data science or been in the field for some time now.
You might have build tons of Classification models and checked the performance of the model with different metrics.

The big 4 metrics :
* Precision
* Recall
* Accuracy
* F1-Score

We know, that when we have imbalanced data accuracy is not the metrics we should be looking at as it may be misleading, instead we should account for F1-Score when our dataset is imbalanced.

But will F1 really going to help?

ROC curves are one of the best methods for comparing the model’s goodness.

let’s revise our basics.

Also, check out my article on Calculating Accuracy of an ML Model.

The 4 popular metrics are calculated based on the confusion matrix.

A confusion matrix (aka error matrix) is a method of summarizing the performance of our classification model’s result.
That shows the number of correct and incorrect predictions.

Precision = TP/(TP+FP)
Sensitivity(recall)=TP/(TP+FN)
Accuracy=(TP+TN)/(TP+TN+FP+FN)

where,

True Positive (TP): Observation is positive, and is predicted to be positive.
False Negative (FN): Observation is positive, but is predicted negative.
True Negative (TN): Observation is negative, and is predicted to be negative.
False Positive (FP): Observation is negative, but is predicted positive.

Suppose:

We built an image classifier that distinguishes between ‘CAT’ and ‘Dog’.
Where CAT is the Positive Class and DOG is the negative Class.
The confusion matrix we got was something like the one shown below.

From the confusion matrix, we can see that our data is imbalanced with 2431 ‘CAT’ images and 443 ‘DOG’ images.
However, our model shows that we have 2320 ‘CAT’ (TP+FP) images and 554 ‘DOG’ (TN+FN) images.

Accuracy = (1984 + 107)/(1984 + 107 + 336 + 447) = 72.76%
Precision = (1984)/(1984 + 336) = 85.51%
Recall = (1984)/(1984 + 447) = 81.61%

The metrics show that the model is pretty good, but is it?

What if we, take DOG as the positive class and CAT as the negative class.

Accuracy remains the same, however:

Accuracy = ( 107 + 1984 ) /( 107 + 1984 + 447 + 336 ) = 72.76%
Precision = (107)/(107 + 447) = 19.31%
Recall = (107)/(107 + 336) = 24.15%

The same model’s score has decreased Drastically.

Problems with the Precision and Recall.

If we look at the formula for both the Precision and Recall we can see that both the metrics are used to calculate in terms of the positive class.

We know that:

Precision is the ratio of correct positive predictions to the total number of positively predicted classes.
Now, we use precision to check all the data on how many are actually the positive classes.
In simple words, in our data of ‘CAT’ and ‘DOG’ how many are actually ‘CAT’.
Recall is the ratio of correct positive predictions to the total number of positive classes.
Recall is used to check all the positive class, how many did the model label to be a positive class.

We can see the problem here! The metrics are mostly biased towards positive class.

We want our model to classify between the classes correctly. but how do we check whether it’s doing it or not?

Advanced Metrics to the rescue!

Matthews Correlation Coefficient (MCC)

The Matthews correlation coefficient (MCC) or phi coefficient is used in machine learning as a measure of the quality of the model in binary classifications.
The MCC is defined identically to Pearson’s phi coefficient.
The coefficient takes into account true and false positives and negatives and is generally regarded as a balanced measure that can be used even if the classes are of very different sizes.
It ranges from -1 to 1.
A coefficient of +1 represents a perfect prediction,
0 no better than random prediction and
−1 indicates that the model wasn’t good enough to classify it all.
MCC gives importance to all the values of the confusion matrix, irrespective of a data being class imbalanced.

Taking our example of ‘CAT’ and ‘DOG’ :

MCC = (1984*107)-(336*447)/{(1984+336)*(447+107)*(1984+447)*(336+107)} ^ 1/2
==> ~0

Which shows that our model has random predictions.

Even when we take our other example:

MCC = (107*1984)-(447*336)/{(107+447)*(336+1984)*(107+336)*(447+1984)} ^ 1/2
==> ~0

In both, our cases MCC gave the same result showing that our model was not good enough to distinguish between classes properly

Conclusion:

The metrics practiced mostly can be misleading. However, depending on the use case you might want to use them.
In order to find the performance of a model Matthews Correlation Coefficient (MCC), is the best metrics to go with.

Happy Learning!!!

Like my article? Do give me a clap and share it, as that will boost my confidence.
Also, check out my other post and stay connected for future articles on the basics of data science and machine learning series.

Also, do connect with me on LinkedIn.