Causal Inference in Mass Media Marketing: A Deep Dive into Campaign Effectiveness

Comprehensive Tools for Evaluating Marketing Success — Part 1

Published in

Blog Técnico QuintoAndar

9 min readJul 21, 2023

A well structured marketing strategy has played a major role in displaying QuintoAndar's advantages in comparison more traditional real estate agencies to the general public, generating immense value to the company over the course of the last 10 years.

These strategies are the product of a super competent and creative marketing team being backed up by a robust and technically adept data team that has helped define what works and what doesn't by means of causal analysis, providing clear and comprehensive insights for our creatives to leverage.

Finding clear impacts from offline campaigns is not an easy task. This has been one of the main tasks of our Data Branding and Media Optimization teams for the last 3 years. After lots of work, we are now able to infer causality in the relationship between our main business KPIs and the campaigns being aired.

In this post we will talk about how we measure campaign effectiveness using Causal Methods to understand how a Mass Media campaign altered the behaviour of our metrics.

Causal Inference Overview

Causal inference aims to relate variables in such way that we can understand what B caused in A, taking a step further than traditional statistics and trying to capture the cause-and-effect relation other than describing data and inferring distributional parameters from sample (Pearl 2009).

These methods are especially important in impact evaluation studies, where the main goal is to understand how an intervention impacted the behaviour of a metric in a system.

An example could be: How a income tax implementation (intervention) impacted the real wages (metric) in the brazilian economy (system).

But in our case we want to understand: How a Mass Media Campaign altered the behaviour of KPIs inside our Business.

To achieve this goal, we need robust methods to develop counterfactual scenarios. Simply put, in the context of our main goal, a counterfactual scenario could be described as:

Given that we have aired a Mass Media Campaign, what would have happened to our KPIs if we hadn't aired it?

For a deeper dive in counterfactuals and their importance for causal analysis, you can refer to Judea Pearl's lesson on the topic.

Causal Methods for Campaign Effectiveness

The methodological possibilities to develop robust counterfactuals is vast, with some methods fitting better to certain scenarios than others. Inside our team, we use many different models and methodologies to measure impacts in different projects. Nevertheless, when it comes to measuring Campaign Increment, we primarily rely on two methodologies:

Synthetic Controls: Employed predominantly for regional campaigns. Better suited for when we have few cities participating in the campaign and many cities not participating in it.

Causal Impact: Our preferred method for national campaigns. This method stands out especially when designating control and treatment units becomes challenging endeavor due to the campaign’s vast reach.

Given this foundation, we visualize the typical trajectory of how our campaigns should influence our KPIs:

Steps from Campaign Launch to KPI Increment. By the owner

With this, in this post we will take a deeper look on how we employ one of these causal methods to better inform our marketing team on how our campaigns are impacting our KPIs.

Synthetic Control (SC)

Blending Together Control Units to define a counterfactual

Said to be the “most important development in program evaluation in the last decade” (Atheyand Imbens, 2016), the Synthetic Control (Abadie and Gardeazabal, 2003) method is a powerful approach in causal inference, particularly in policy evaluation and observational studies where traditional experimental design is not feasible, as in Mass Media Marketing campaigns.
The method provides a way to develop robust counterfactual scenarios for when traditional control groups become hard to define. For instance, assessing the impact of a Mass Media Marketing Campaign in a city when there’s no identical “control” city available for comparison.

We can define this method as follows:

(1) Identify the Treated Unit and the Control Pool: The first step is identifying the unit that received the treatment (in this case, the city or cities that are participating in the campaign) and a set of potential control units that did not receive the treatment.

(2) Pre-intervention Period and Post-intervention Period: Next, we split the data into two periods — before the intervention (pre-campaign) and after the intervention (post-campaign).

KPI value for Treated Unit vs. Control Cities. By the Author

(3) Scale the KPI data: In this step we scale the data for the treatment city and control group. This helps to better capture behaviour similarities even when the volumes between the cities are significantly different. Note that we save the Scaler to descale the treatment effect for a estimate of the overall increment.

Having, the Scaler defined as:

We have scaled features as:

Therefore, we can define our scaled Treated Unit and our Scaled Control Pool as:

(4) Construct the Synthetic Control: The SC for the treated unit is a weighted average of the units in the control pool. The weights are chosen to make the pre-intervention outcomes and other covariates from the synthetic control as close as possible to those of the treated unit.

Where the weights satisfy:

Synthetic Control (synthetic) of the Treated Unit (observed). By the author

(5) Comparison and Estimation of Causal Effect: After constructing the SC, we compare the post-intervention outcomes of the treated unit and the synthetic control to estimate the causal effect of the intervention. Hopefully getting a clear view on how the campaign affected the city.

Treatment Effect of the Campaign. By the author

For better understanding, here is a dummy python function giving a overall view on how this could possibly work:


import pandas as pd
from sklearn.preprocessing import StandardScaler

def create_synthetic_control(treated_city: str, df: pd.DataFrame, CPGN_start: str, TREATED_CITIES: list) -> pd.DataFrame:
    """
    This function creates a synthetic control for a given treated city.
    - treated_city: The city for which we want to create a synthetic control.
    - df: The DataFrame containing the data for all cities.
    - CPGN_start: The start date of the campaign.
    - TREATED_CITIES: List of cities that were treated.
    """
    cities = [c for c in df.columns if c != 'after_cpgn']

    # Assigning a binary flag for post-campaign period
    df = df.assign(after_cpgn = lambda x: (x.index > CPGN_start).astype(int))
    
    for city in cities:
        # (2) cropping the dataframe to contain only pre-intervention data
        # (3) scaling the data
        scaler = StandardScaler().fit(df[city].loc[df.after_cpgn < 1].values.reshape(-1,1))
        df[city] = scaler.transform(df[city].values.reshape(-1,1))

    X = df.loc[df['after_cpgn'] < 1].drop(columns=TREATED_CITIES).values
    y = df.loc[df['after_cpgn'] < 1][treated_city]

    # (4.0) Algorithmically finding optimal bounds and weight_sum for the SC
    weight_sum, bounds = optimize_selection(X, y, treated_city)
    
    # (4.1) Getting weights for each control unit and creating SC
    weights = get_w(X, y, weight_sum, bounds)
    synthetic = df.drop(columns=TREATED_CITIES).values.dot(weights)

    # (4.2) Returning a dataframe with the observed and a SC for the treated city
    return (df[[treated_city] + [c for c in df.columns if c != treated_city]]
            .rename(columns={treated_city: 'observed'})
            .assign(synthetic=synthetic))

So, in a simplified way, the underlying mechanics of the model seeks to generate a synthetic version of the treated unit by "blending" together information from cities that were not treated.

We can then average the effects over all treated units to have the average impact of the campaign.

Average Treatment Effect over units. By the author

This methodology allows for effective campaign measurements with interpretations and plots simple enough to pass to the marketing team responsible for the campaign without many further treatments needed to be done.

How do the marketing teams consume it?

Understanding the intricate causal models and their implications can be a challenge, but our goal is to transform complex analyses into actionable insights.

(1) Visual Interpretation

The visualizations distill the impact of campaigns on specific regions, making the understanding on the results more digestible for non-technical team members and democratizing the results over all interested areas of the company.

(2) Collaborative Exploration

In collaboration with our Media & Marketing teams, we dissect the increment results. Co-creating hypotheses about campaign performance.
By understanding what did or didn’t resonate with specific regions, we can get insights on how to drive our strategies more precisely.

(3) Feedback Loop

Our campaign increment models aren’t final— they’re the beginning of a feedback loop. They prompt questions, drive discussions, and lead to experiments that fine-tune our Mass Media Campaigns.

(4) Clear Presentation

To maximize clarity, our final presentations distill the essence of our findings. While the underlying mechanics of our models are intricate, the takeaways for our marketing teams are simplified, actionable, and direct.

Our final message may be displayed in a format like this:

Example Final Presentation slide. By the author

This format simplifies the results and provides results and insights in an efficient format, creating space for our marketing teams to discuss and make campaign adjustment based on the results from our Casual Models.

Conclusion

Causal Models, as illustrated, provide us with the toolbox necessary to reveal how our interventions impact the systems around us. These tools empower us to delve deep into the intricate cause-and-effect relationships, drawing clearer lines between our actions and their outcomes. As we navigate an increasingly data-driven world, the ability to understand and predict the cascading effects of our decisions becomes ever more necessary.

Offline campaigns, by their very nature, carry complexities that can make measurement and analysis a challenging endeavor where traditional metrics often fall short in capturing the real essence of campaign impacts. The Synthetic Control model offers a lens through which we can better estimate effects by creating robust counterfactuals, enabling us to gauge the impact of our regional campaigns in a reliable, coherent and interpretable way.

Ultimately, the insights derived are only as powerful as the actions they drive. And in our collaboration with the marketing teams, we’ve created a system where data-driven insights seamlessly lead to actionable strategies, ensuring that QuintoAndar not only stays ahead in the market but also remains deeply attuned to its audience.

References

Abadie, Alberto, and Javier Gardeazabal. “The Economic Costs of Conflict: A Case Study of the Basque Country.” The American Economic Review, vol. 93, no. 1, 2003, pp. 113–32. JSTOR, http://www.jstor.org/stable/3132164. Accessed 21 July 2023.

Bouttell J, Craig P, Lewsey J, et al “Synthetic control methodology as a tool for evaluating population-level health interventions” J Epidemiol Community Health 2018;72:673–678.

Chernozhukov, Victor, Kaspar Wuthrich, and Yinchu Zhu. “A t-test for synthetic controls.” arXiv:1812.10820 [econ.EM]. https://doi.org/10.48550/arXiv.1812.10820.

J. Pearl, “Causal inference in statistics: An overview,” Statistics Surveys, 3:96–146, 2009.

J. Pearl, and Elias Bareinboim. “Tutorial Session B — Causes and Counterfactuals: Concepts, Principles and Tools.” Microsoft Research, 6 Jan. 2014, https://www.microsoft.com/en-us/research/video/tutorial-session-b-causes-and-counterfactuals-concepts-principles-and-tools/.

J.Y. Halpern and J. Pearl, “Causes and explanations: A structural-model approach — Part I: Causes” In British Journal of Philosophy of Science, 56:843–887, 2005.

Matheus Facure, & Michell Germano. (2021). matheusfacure/python-causality-handbook: First Edition (v1.0). Zenodo. https://doi.org/10.5281/zenodo.4445778