Precision or Recall — Which one do I use?

3 min readFeb 3, 2023

3 simple questions can help you make this decision

Too Long Didn’t Read:

My aim is to predict the maximum number of ‘ones’ correctly, without misclassifying any ‘zeros’ as ‘ones’ — Use Recall
My aim is to predict the maximum number of ‘ones’ correctly, even if that means some ‘zeros’ are misclassified as ‘ones’ — Use Precision

Introduction

Precision and recall are essential evaluation metrics in the field of information retrieval and machine learning. While they are drastically different, it is often confusing to select which of the two will be the most appropriate metric for a given problem.

Let us try to understand the terms ‘Precision’ and ‘Recall’ without using the words ‘True Positive’, ‘False Positive’, ‘True Negative’, and ‘False Negative’.

Evaluation Metric — Precision

I will start off with a very popular example — the cancer diagnosis problem. You are building a machine learning model that predicts whether a patient has cancer or not, using medical imaging and other clinical data. Based on the model’s predictions, the doctors run an additional test to treat the patient.

Now ask yourself, which of the following is more important —

Since I care more about identifying people who might have cancer, even if that means we run additional tests on people who do not have cancer — The evaluation metric here should be Precision.

Evaluation Metric —Recall

Let’s take another example and talk about an image classification problem. You are building a machine learning model that predicts whether an image should be classified as Nature (forest, mountain, ocean) or Food (pizza, burger, pasta) and so on. I will restrict the example to these two classes for simplicity.

Let us again compare the two possible objectives —

Since I want to focus on the cleanliness of my categories — The evaluation metric here should be Recall.

I believe that understanding which evaluation metric to use when is more important than remembering the formulas for them. Hopefully, the above examples helped you with selecting the right evaluation metric for your problem. Good Luck!

Examples for Practice

Here are some common examples that you would have come across before. Let’s see if we can all agree on which evaluation metric should be used here.

Fraud detection: The goal is to build a machine learning model that correctly identifies all instances of fraudulent transactions, even if it means that some non-fraudulent transactions will be misclassified as fraudulent. Would you recommend using precision or recall?

Ad targeting: Receiving irrelevant ads can be annoying and hence the focus here should be to show the most relevant ads to the users. So a machine learning model that predicts which ad should be shown to a customer, will be evaluated using precision or recall?

Email spam filtering: While it is efficient to filter out spam mail, missing out on an important email could be problematic. So the goal is to build an ML model to classify emails into spam and non-spam categories. Which evaluation metric would be appropriate to ensure you don’t miss out on important emails?

Feel free to give your answers, or share more problem statements in the comment section below!