Causal Machine Learning in Marketing

Heinrich Kögel
10 min readJul 31, 2023

This article provides a case study that demonstrates how we can leverage causal machine learning for better decision-making in marketing.

Understanding cause and effect is crucial in the business world. When it comes to pricing, for example, it is important to know how customers change their buying behavior when prices are adjusted. And to figure out if a marketing campaign is worth continuing, we need to understand if it actually has an impact on the KPIs we care about.

It may seem obvious that understanding cause and effect is crucial when addressing these kinds of questions. But in reality, people often confuse causation with mere correlation, which can lead to costly mistakes in decision-making. Let’s look at an example that illustrates this point.

The costly mistake of confusing correlation and causation

Imagine an ice cream parlor that decides to advertise in the local newspaper during the summer. After placing the ads, the owner notices an increase in sales and concludes that the ad campaign was a huge success. After all, sales were much higher during the campaign compared to the rest of the year. Thus, the owner plans to spend more money on newspaper ads. However, what the owner does not realize is that ice cream sales are always higher in the summer, regardless of any ads. Besides, hardly anyone reads the local newspaper anymore, so the ads did not really make a difference. In statistical terms, the owner confused correlation and causation. The ad campaign was correlated with higher sales because it took place in summer. But it did not actually cause the increase. This confusion leads the owner to make the expensive decision to keep running the ad campaign. Although this is only a stylized example, it highlights just how crucial it is to understand causal relationships and not confuse them with plain correlation when making business decisions.

Source: https://xkcd.com/552/

A/B Testing is not always feasible

In data science and business, a commonly used approach to estimate causal effects is conducting A/B tests. The tradition of A/B tests dates back many centuries, with its origins in medicine. Other terms for A/B tests include experiments, randomized controlled trials (RCTs), and split tests. The fundamental concept behind an A/B test revolves around the random assignment of units of observation, such as customers, into two distinct groups. One group, referred to as the treatment group, receives a specific intervention or treatment, such as a marketing email. The other group, known as the control group, does not receive any treatment. By comparing the treatment group with the control group, we can determine the causal effect of the intervention on the KPIs of interest due to the random assignment.

However, while A/B testing can be a valuable tool in many settings, it is not always feasible in real-world scenarios. For example, it is probably not advisable for a company to randomly allocate its marketing budget across markets to test the effectiveness of its marketing strategy. In addition to situations where A/B testing may be too costly or risky, there are also scenarios in which A/B testing can be too time-consuming to obtain actionable insights or even prohibited by law due to ethical considerations.

Causal machine learning to learn from naturally-collected data

In such circumstances, causal machine learning comes to the rescue as a valuable solution. This relatively new field combines rigorous statistical and econometric theory, developed over decades, with flexible, novel machine learning methods. Causal machine learning techniques enable the estimation of causal effects without relying on A/B tests. Instead, they leverage “naturally collected” data, such as information about customers and prices, to estimate causal effects.

One powerful approach in the toolkit of causal machine learning methods is the double machine learning approach by Chernozhukov et al. (2018). Since its development, the approach has been extended in numerous research papers. It allows to estimate causal effects in a reliable way and has desirable statistical properties. The user guide section of the doubleML package provides a great introduction to the method.

In the following, we present a case study that leverages the double machine learning approach to make informed decisions in marketing. The case study is inspired by real-world consulting projects that we conducted jointly with multiple medium-sized and large companies. For confidentiality reasons, the data used in the case study are simulated.

Case study: Estimating the effect of a marketing campaign

In this case study, we examine TechGear, a company that sells computer equipment to other businesses. In the past year, TechGear ran a marketing campaign offering discounts to certain customer firms. Now, the marketing managers are thinking about whether they should continue providing these discounts. To make an informed decision, they want to assess the effectiveness of offering discounts and determine if it actually boosted sales. Ideally, they would like to conduct an A/B test to measure the impact. Unfortunately, conducting such a test is not feasible due to time and budget constraints.

Thus, to evaluate the campaign’s effectiveness, the team initially considers comparing average sales from the customer firms that received the discount and those that did not. However, during discussions on how to evaluate the campaign, one of the managers mentions that larger firms, as measured by the number of employees, were more likely to receive the discount. Additionally, TechGear has higher sales with larger firms. This raises concerns among the data scientists. They explain that simply comparing sales between the firms that received the discount and those that did not could lead to incorrect conclusions.

Confounding variables lead to false conclusions in naive estimations

To understand why this comparison would be misleading, let’s look at the following figure that illustrates the relationship between receiving the discount, firm size, and sales. We are interested in understanding the effect of receiving the discount on sales (arrow 1). However, we also know that firm size influences the likelihood of receiving the discount (arrow 2), and that larger firms tend to result in higher sales (arrow 3). In causal terminology, we refer to firm size as a “confounding variable” because it is associated with both discount and sales. If we do not account for the influence of this confounding variable when estimating the effect of the discount, we will obtain an incorrect estimate.

We want to estimate the effect of discount on sales. Firm size is a confounding variable.

To illustrate this, consider an extreme example. Let’s say TechGear’s average sales to large firms were $100, while sales to small firms were $50. Furthermore, let’s assume that the discount had no effect on sales, and all large firms received the discount while small firms did not. If we were trying to determine the effect of the discount on sales by comparing the average sales between the firms that received the discount and those that did not, we would obtain an incorrect result. This is because we would effectively be comparing large and small firms. Despite the assumption that the discount has no effect, we would mistakenly conclude that the discount increased sales by $100-$50 = $50 per firm. To isolate the causal effect, we need to “control for” or “hold constant” the confounding variables in our estimation.

Back at the brainstorming session of our team at TechGear, things become even more complicated when another marketing manager points out that the discount was also more frequently given in the first quarter of the year when customer firms typically made more purchases from TechGear. And this manager is also uncertain about whether there may be other confounding variables, which could have simultaneously influenced receiving the discount and sales.

Causal machine learning to estimate the campaign’s effectiveness

Fortunately, TechGear has a comprehensive customer database that contains various customer characteristics, including sales, firm size, and whether a firm received the discount. This enables the data scientists to employ the double machine learning approach to estimate the effect of the discount on sales.

Specifically, they decide to estimate a partially linear regression model, which consists of two main equations. The model can be expressed as follows:

Equation (1) represents the relationship of interest, connecting sales with the receipt of the discount. Coefficient θ represents the effect of receiving the discount on sales. Vector X includes any variables suspected to be confounding variables, meaning variables that influence both sales and the receipt of the discount. Function g is a flexible function that models the relationship between the variables in X and sales. The second equation serves as a “helper equation” that is necessary for the double machine learning approach to be able to leverage modern machine learning techniques in estimating the causal effect of the discount on sales. This equation captures the relationship between the receipt of the discount and our set of potential confounders X. The variables ɛ and η are random noise terms. For more detailed information on the theory behind the double machine learning approach, please see Chernozhukov et al. (2018) and the user guide of the doubleML package.

After examining the customer database, the data scientists discover approximately 3000 customers in the database. They decide to include five variables as potential confounding variables in the model: firm size (measured by the number of employees), the quarter of the year when a company made a purchase from TechGear, sales prior to the year of the discount campaign, if a firm’s headquarters is located in the US, and the number of markets in which a firm operates.

Once they determined which variables to incorporate into the model, they utilize the doubleML package to estimate the effect of the discount. The estimation process follows a few simple steps. Firstly, they create the data object for the double machine learning estimation. There they specify the main variables for the analysis and the data, which they have already loaded in a pandas dataframe:

import pandas as pd
import doubleml as dml
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

np.random.seed(123)

# Specify the potential confounders
potential_confounders = [
'size', 'past_sales', 'quarter',
'number_markets_active', 'headquarters_us'
]

# Instantiate the doubleML data object
data_dml = dml.DoubleMLData(
data=df,
y_col='sales',
d_cols='discount',
x_cols=potential_confounders
)

Next, they specify the two learners to estimate the relationship between sales and the potential confounding variables X, as well as the relationship between the discount and X. To model the relationship between sales and X in equation (1), they utilize a random forest regressor. Similarly, they use a random forest classifier to capture the relationship between the discount and X in equation (2). By using random forest learners, the data scientists can estimate these relationships in a highly flexible way, without imposing strong assumptions on the functional forms of g and m. This flexibility allows them to capture complex patterns and interactions in the data, making the estimation process more reliable.

# Instantiate random forest learners
rf_reg = RandomForestRegressor(n_estimators=500)
rf_class = RandomForestClassifier(n_estimators=500)

# Instantiate the doubleML partially linear regression model
dml_plr = dml.DoubleMLPLR(
obj_dml_data=data_dml,
ml_l=rf_regressor,
ml_m=rf_classifier
)

Finally, they apply the fit method and print the results for the estimation.

# Run estimation
dml_plr.fit()

# Print estimation results
dml_plr.summary

The data scientists discover that providing the discount increases sales by approximately $3200 per firm, as indicated by the value in the column “coef”. This estimate is highly significant, with a p-value below 1%. It is also quite close to the true effect of $3000 that was specified when simulating the data for this case study (although the data scientists are unaware of this, of course). Equipped with this result, the data scientists schedule the next meeting with the marketing managers to present their findings.

The campaign has little effect — find alternative marketing measures!

After thoughtful consideration, the marketing managers express satisfaction that the discount had a positive impact on sales. However, given that TechGear’s average sales to firms amount to around $96,000, they decide that they need to explore other marketing measures that can have a more substantial effect on sales. As the meeting progresses, one of the marketing managers remains curious and wants to know what the estimated effect would have been if the data scientists had applied a simple, naïve mean comparison between the sales from firms that received the discount and those that did not, without controlling for any confounding variables. The data scientists respond that, in that case, the estimated effect would have been around $18,300. The marketing managers feel relieved that the data scientists employed causal machine learning to estimate the effect of the discount. Without this knowledge, they would have continued offering the discount without realizing its minimal impact on sales.

Causality matters for sound decision-making

Thank you for taking the time to read the article! This article emphasizes the importance of understanding the difference between correlation and causation. To make the right decisions when it comes to cause-and-effect relationships, we need to use the appropriate methods. Causal machine learning provides a helpful framework for teasing out causality from data and gaining deeper insights.

The case study presented serves as an illustration of the power of causal machine learning in marketing. However, it is important to note that the potential of causal machine learning extends beyond this specific scenario. In practice, causal machine learning, including double machine learning, can be applied in various areas and contexts. In addition to learning from naturally collected data, causal machine learning can also be utilized in A/B testing. For example, it can assist in developing personalized decision rules for targeted marketing strategies.

If you have any questions or thoughts you would like to share, please feel free to comment below or reach out to me.

Acknowledgments

I thank Martin Spindler and Sven Klaassen for providing valuable input for this article.

About the author
Heinrich is a Data Science Manager at Economic AI. He is passionate about leveraging data to create value in areas such as dynamic pricing, marketing, and financial forecasting and planning. Heinrich holds a PhD in Quantitative Economics and lives in Munich, Germany. When he is not busy harnessing the power of data, he enjoys exploring the nearby Alps by bike or on foot.

--

--