Evaluation Metrics-I: Precision , Recall and F1 score

Raghavi Adoni

Now that we have discussed about confusion matrix, it’s time to learn about precision and recall.

Precision basically tells us that out of the results classified as positive by our model, how many were actually positive.

Recall tells us how many true positives (points labelled as positive) were recalled or found by our model.

Some models can be high recall model and some models can be high precision models.

Take the example of a medical model where we need to classify if a person is sick or not. What we need to take care of is that our model shouldn’t predict a sick person as healthy i.e. false negatives should be low. This is thus a high recall model(value of recall will be more than that of precision) as we need to know out of the total sick patients,how many were recalled or how many were correctly diagnosed as sick.

Now take the example of a spam detection model where the model predicts whether a mail is a spam or not. We cannot afford a non-spam mail being labelled as spam as in that case a very important mail might get labelled as spam. Hence, our model should have less false positives and we need to know out of all the mails classified as spam , how many were actually spam. This is thus a high precision model(value of precision will be more than that of recall).


F1-score is a metric which takes into account both ,precision and recall as we can’t always evaluate both and then take the higher one for our model. It is the harmonic mean of precision and recall. It tells us about the balance that exists between precision and recall.

But F1-score gives equal weightage to both, the precision and the recall and as a result, even if either of precision or recall is low, F1-score will tend to be low too( it is the harmonic mean after all). But as discussed above , some models might require higher precision or recall. Hence, F1-score wouldn’t really be good metric to evaluate the model and we should take into consideration F-Beta Score instead.

F-Beta score is used as a evaluation metric when you don’t want to assign equal weights to precision and recall . For high recall models, F-beta score with beta>1 is used as it’ll assign more weight to recall than precision thus when the model will give produce high recall , it’ll have a high F-beta score and we can aim to maximize it . Similarly , for high precision models, we’ll use F-beta score with beta<1 .

That’s it for this article. Stay tuned to know about more evaluation metrics!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade