Uplift Modeling — An Explanation of the Unknown Challenger in Marketing Campaigns
The goal of this article is to provide a fundamental understanding of what uplift modeling is, and how it can boost your marketing campaigns.
Coincidentally, when I started writing this article, I received a promotion by one of my former banks, which offered me an exclusive recreational vehicle loan. Have you ever (and I’m sure you have) received such an offer? Discount coupons, limited offerings or personalized advertisement are all examples of a company running marketing campaigns. The goal of these campaigns is to persuade customers to spend more money, stay with the company or effect any other change in customer behavior which leads to a higher revenue.
The question for each of these campaigns is: How do the companies identify the right individuals for a mailing? And what is “right”? First, randomly offering discount coupons to an individual might not be the best approach as costs of sending an offer can easily exceed the potential outcome of customers responding to an offer. Second, companies are usually incapable of sending offers to each individual because of limited marketing budget and time.
Fortunately, we are in the era of Artificial Intelligence. Thus, we could use a Machine Learning algorithm, let it train on some historical data, and ‘voilà’ we know whom to target. While such an approach might be superior to randomly targeting individuals, other sophisticated approaches (hint: it’s called uplift modeling) might be more applicable. To explain the latter, I will first introduce the traditional approach. By explaining what it is and what disadvantages it brings along, I can motivate the use of uplift modeling.
Let’s get started!
The ”Champion Model” — Traditional Response Modeling
The current standard for targeting the right customer is called Response Modeling or Outcome Prediction. The cornerstone of this approach is a previous campaign which contains all kinds of information about each targeted individual such as demographic, geographical or any product-related information. Further, it contains the information whether an individual stayed within or left the company after the campaign. See figure 1 for an example.
Note: From here, I will use customer churn as a target variable, but other variables are possible as well: Whether an individual became a customer or not. Whether a customer spent more money etc.
In an ex-post analysis, a supervised learning algorithm such as Random Forest, Gradient Boosting or Linear Regression is used to train a model which can separate customers who stayed with the company from customers who left the company. After training the model on the old campaign, the model is applied to the population on which the new campaign should run. Subsequently, the model is returning a score for each customer, representing the probability of that customer leaving the company. Ultimately, we can use this information to target those customers who are most likely to leave, send them an exclusive offer, cross our fingers and hope that they stay.
Although response modeling is usually outperforming random targeting, we might leave some money on the table. In its definition, response modeling is modeling the relationship between the independent variables, the information ‘describing’ the customer, and the target variable (e.g. customer churn), such that the model is able to distinguish between customers leaving the company and customers staying with the company. But that is not what we want! In order to run a truly successful campaign, we need to target customers who will stay with the company because of the campaign, i.e. we need to target those customers who are likely to stay if targeted, but unlikely to stay otherwise. Due to this drawback, response modeling cannot differentiate between four types of groups:
1. “Sure Things”: Customers who stay with company anyway. Even without the treatment these customers are loyal. Thus, we wasted money not only by giving them an exclusive discount, but also because we invested time and money to send them the discount.
2. “Sleeping Dogs”: Customers who stay with the company if not contacted, but who leave the company if contacted. Sending exclusive discounts to such customers can even have a negative impact on the revenue because they will be less likely to stay with the company.
3. “Lost Causes”: Customers who do not stay with the company whether they are contacted or not. Mailing these customers would be a waste of money and time.
4. “Persuadables: Customers who stay with the company only if they receive a treatment. Thus, such individuals have a positive reaction to the marketing campaign. The goal is to identify these customers!
To circumvent this limitation, we need a model which models the change in behavior that results from the treatment. Such an approach is called Uplift Modeling.
Note: I introduced the term ‘treatment’ which is another synonym for ‘offer’. Usually the word ‘treatment’ is used in the uplift modeling literature, and thus, I’m going to use it here.
The Unknown Challenger — Uplift Modeling
In a nutshell:
UPLIFT MODELING MODELS THE CAUSAL EFFECT OF A TREATMENT ON A CUSTOMER OUTCOME
The only difference to building a response model is marginal, but very important: We need two different groups in the population. A treatment and a control group. Individuals in the treatment group have been subject to some action such as a marketing campaign. Individuals in the control group have not been subject to any action, serving as a reference group. Due to the control group, the effect of a treatment can be measured by comparing the performance of treatment and control group. Thus, in addition to the information describing individuals, and the target variable, we need the information whether an individual received a treatment or not.
The goal of uplift modeling is to model the difference between the probability of staying with the company in the treatment group and the probability of staying with the company in the control group.
Note: Such information can either be collected by running a small marketing campaign in advance, using the same treatment as we will in the main marketing campaign, or it can be used from a previous campaign in which we have used the same treatment and a control group.
The Weakness of the Challenger — Fundamental Problem of Causal Inference
After reading, what uplift modeling is and why it can be superior to response modeling, you might ask yourself, why the former is not the gold standard for identifying the right individuals in a marketing campaign? Unfortunately, we cannot apply an arbitrary machine learning algorithm “off the shelf” when it comes to uplift modeling because of the Fundamental Problem of Causal Inference.
“For every individual, only one of the outcomes is observed, after the individual has been subject to an action, or when the individual has not been subject to the action, never both” Soltys, Jaroszewicz, Rzepakowski (2015)
That means, we do not know the “ground truth” for a causal effect for any individual. We observe the outcome for an individual when treated, or when not treated, but not both at the same time. For example, take a look at figure 2: For the first individual we know that the outcome is zero when we not treat him, but we do not know what the outcome would have been if we had treated him.
However, in supervised learning scenarios we need the “ground truth” to help the algorithm learning. That leaves us with the following question: How can we model the causal effect of a treatment on a customer outcome, if we can’t use off-the-shelf machine learning algorithms.
Fortunately, other machine learning enthusiasts had the same question and came up with different solutions. In the following, I’ll briefly describe each of these approaches but leaving a more detailed explanation to another article.
The Ace Up Uplift’s Sleeve — Uplift Modeling Approaches
1. Two-Model Approach:
Develop one response model for the treatment group, and one response model for the control group. Subsequently, use the first model to calculate the probability of an individual for the outcome if the individual had been treated (P_t), and use the second model to calculate the probability for the outcome for the same individual if the individual had not been treated (P_nt). In the last step, we calculate the difference between the two probabilities (P_t -P_nt), which (at least in theory) gives us the differential effect of the treatment on the outcome probability for this individual.
2. Transformed Outcome Approach:
Instead of working with two variables (treatment and outcome), we create one new variable, as a mathematical combination of both variables, which is then used as a target variable. Subsequently, off-the-shelf algorithms can be used with the new target variable.
3. Direct Uplift Modeling:
The last approach is to modify existing supervised learning algorithms to directly infer a causal effect. According to the current literature, decision trees and different ensembles of decision trees are the most popular adjusted algorithms. Usually, the tree-building algorithm and the splitting criterion are modified such that they maximize the difference in uplift.
Who is the Winner?
It is hard to usurp the champion from the throne. However, sending uplift modeling into the ring can totally be worth it as literature has proven. Thus, putting extra work in collecting the right data, running small campaigns in advance and work with more difficult approaches to model the causal effect can be profitable for marketers.
Let me end my article with a quote:
Every champion was once a contender that refused to give up — Rocky Balboa
Literature
Radcliffe, N. J. (2007). Using control groups to target on predicted lift: Building and assessing uplift models. Direct Marketing Analytics Journal, 1, 1421.
Sołtys, M., Jaroszewicz, S., & Rzepakowski, P. (2015). Ensemble methods for uplift modeling. Data mining and knowledge discovery, 29(6), 1531–1559.