Understanding Lift curve

A brief introduction to lift curve usage in marketing and machine learning

Alexandre Serre
Analytics Vidhya
4 min readFeb 16, 2020

--

Photo by Annie Spratt on Unsplash

Maybe your job interviewer asked you about lift curve and you nearly had a heart attack, or maybe you are curious about that curve with a catchy name you heard about? It could be a lot more of possibilities I guess, but don’t worry, I think you found the article you needed!

Introduction

In this article, you won’t find any code nor complex math formulas. I will try to explain you how to build a lift curve and its application in a digital marketing use case and also how it can be used to evaluate a classification model.

To briefly introduce the concept, the lift curve describes a performance coefficient (the lift) over the cumulative proportion of a population. Don’t be afraid, you will better understand that sentence after reading the next part!

Lift curve in a mailing campaign

Supposed you have a database containing 10 000 customer contacts and you sent them an email with a link to your website.

10% of the customers clicked the link, not so bad (or maybe yes, but we don’t care at the moment). Now if you send an email to 1000 customers instead of 10 000 you can expect to get approximately the same figure… Not so sure! Indeed, if you divide your 10 000 customers in 10 groups of 1000 you may end with a group with 4% of click rate and an other with 16%.

That means you could have a better click rate with a certain part of the customers!

Let’s see how it looks like on a chart.

Group 7 and 8 are doing quite well, there are slightly more clickers in those groups than average (respectively 16% and 12% compared to an average of 10%).

It is now interesting to plot the cumulative percentage of clickers (y axis) in terms of cumulative percentage of customers contacted (x axis). Because we deal with 10 groups of customers, the x axis is based on a 10 percent unit and we consider the groups on the chart above ordered by decreasing click rates.

We see we can reach almost 70% of the clickers with 50% of our contacts basis.

The lift curve uses the ratio between percentage of clickers to the percentage of customer contacted. That means each point of the “Cumulative gain of clickers” chart will constitute a lift value.

Indeed, the lift is represented by this formula :

More concretely, the lift gives an important information on our mailing campaign. It allows us to know by how much the customer conversion (a click in our case) is multiplied for a certain percentage of customer contacted.

Let’s plot it!

Here the highest lift equal to 2.1 means that for the first 10% of your “best” customers you will reach 2.1 more clickers than if you were contacting random customers.

Comparing classification model with lift curve

An other use case of lift curve is in machine learning, for classification problem. For example, your classification model outputs a certain probability for each patient to have cancer. Then, we can make a parallel with the mailing campaign example. Replace the percentage of customers contacted by the percentage of patients diagnosed, and the percentage of clickers by the percentage predict by the model, by decreasing order. In that case, you don’t have to make different groups of patients because you already deal with a percentage. Then, you can easily generate your lift curve.

To compare two classification models with lift curve, you can use maximum lift value as a metric. Also, the longer the flat zone at the beginning of the curve is the more reliable the model is.

To go beyond

Thanks to lift curve, we found that a certain proportion of our customers is more likely to click on the link. However, the groups were made randomly so we can’t really explain why they are better responding to the mailing campaign. In order to explain this, clustering methods can be used to constitute the groups and find the similar characteristics between customers that lead them to be clickers.

Thank you for reading!

--

--