Precision or Recall: Which one to choose for the advanced retail analytics model.

Rajat Munjal
Capillary Data Science
5 min readSep 29, 2021

--

Hi There, We all (By all I mean the ones who are working in the analytics domain: P) have come across these two terms: Precision and Recall. But it’s sometimes (for me it’s almost every time) very confusing or rather challenging to decide which one would be the better metric to determine our model’s success before making it live. In this article, I have tried to explain these two terms in as layman terms as I could and have also put down some of the business use cases from the retail industry.

So before jumping into the use cases let us try and understand the meaning of these two:

Precision and Recall are the two key metrics that we will be coming across whenever building a classification model (statistical or machine learning). These two metrics can be easily calculated by hand using the confusion matrix (Wait! What’s that now?). No worries, I will explain that as well.

So, the confusion matrix is a 2-dimensional (2*2) matrix (in the case of binomial classification) and can be of higher order as well if we are solving a multi-class problem. Below is an example of a 2*2 confusion matrix:

2*2 confusion matrix

The rows, R1 and R2 in the above table represent the actual labels in our data set. The columns C1 and C2, represent the predicted labels for the same data set. The cell values can be represented as below:

R1 (0, 0) = 96000

R1 (0, 1) = 2000

R2 (1, 0) = 500

R2 (1, 1) = 1500

Precision is the proportion of records that the model has correctly classified as 1 out of the total records that have been predicted as 1. From the above table, we can write it down using this formula:

Precision =

= 0.57

This means that, out of all the records which our model has predicted to be 1, 57% of them are actually 1.

The recall is the proportion of records that the model has correctly classified as 1 out of the total records that are actually 1.

Recall=

= 0.75

This means that out of all the actual 1’s 75% have been predicted by our model as well. These metrics are subject to change when the cutoff for labelling the records as 1 is changed.

So, this was a brief about what precision and recall are and how do we calculate them using the confusion matrix. Now let us see some of the business scenarios in the retail space where these metrics play a key role in determining our model’s success.

Campaign Response Prediction:

The campaign response prediction model helps the retail brands to identify the potential customers who are likely to come and visit their stores to make a purchase when communicated with any offers via a marketing campaign. We will not be going into the details on how this type of model is built.

Let’s say we have already trained a campaign response prediction model and now we are trying to figure out the probability cut off beyond which we can label the customers as 1 (meaning these customers have a very high likelihood to respond when communicated with a campaign).

Let us assume that we get the following confusion matrix on our validation data set when we have set up the probability cut off as 0.5 i.e. all the customers who have a probability value of >=0.5 will be labelled as 1 and others will be labelled as 0:

Confusion matrix with 0.5 as probability cut off

In the above table, it can clearly be seen that with 0.5 as our probability cut-off, we are able to reach out to 75% of the potential customers who have a higher likelihood to make a purchase when targeted via a marketing campaign. But at the same time, we will also be targeting 2000 extra customers who have been incorrectly labelled as 1 by our model. Now, what is the final impact that this model is going to have on that brand’s business? Let’s say the brand decides to target these 3500 customers with a campaign then:

1. From the above confusion matrix, we can see that out of all the targeted customers 57% (which is our precision) have actually responded to the communication. Hence, if the campaign’s objective was to have a decent conversion rate (proportion of customers responding out of the contacted base) then this model is good to go ahead with.

2. Only 75% (which is our recall) of our potential responders were targeted with the communication. This means that we missed out on 25% of the customers who were likely to bring in business to the brand. If the campaign’s objective was to get higher sales even if some extra communications are being sent out then we will have to tweak this model to increase our recall by reducing the probability cutoff so as to label more customers as 1.

Final Take Away:

The decision to have a better recall or precision for the above model is kind of subjective and is highly dependent on the campaign’s objective. A good conversion rate and ROI can be achieved with a model that has good precision but if we want to get more responders for our campaign even if the conversion rate is low then a model with a high recall score will be the ideal one to go ahead with.

--

--