Expedia Group Technology — Data

Measuring Marketing Success: The Power of Incrementality and Geo Testing

A guide to comparable evaluation of marketing and its application at Expedia Group

Nivedita Sharma

Published in

Expedia Group Technology

9 min readMay 14, 2024

Hikers in an embrace, looking out at snow-topped mountains. — Photo by S&B Vonlanthen on Unsplash

Ever wondered why we really need marketing? Imagine running a business with a clear goal — attracting more customers. The straightforward solution? Advertise to boost brand visibility and drive sales. But here’s the catch: You’re working with a limited budget, and there are so many places to advertise, from social media and search engines, to billboards and affiliate partnerships.

So, how do you decide where to spend your marketing dollars?

It’s a simple question with a complex answer. All these advertising channels promise their unique and often incomparable advantages, so the key lies in understanding your target audience, grasping the intricacies of each platform, and aligning your goals with the strengths of the chosen avenues. It’s not just about spending; it’s about investing wisely to deliver maximum returns on that spend.

At Expedia Group™, in order to measure the effectiveness of advertising and assess returns, various measurement techniques are at our disposal. However, most of them rely on correlation. It may be enticing to simply credit ads for purchases, but correlation doesn’t necessarily imply causation. In simple terms, suppose we show a TV ad, and the viewer then makes a purchase. Should that TV ad claim full credit of the sale? Just because someone saw the ad doesn’t mean that it was the primary reason that a customer hit the ‘buy’ button.

What we’re after is sales truly driven by marketing. This is where Incrementality Testing steps in, unravelling the cause-and-effect relationship between advertising and customer action. Unlike correlation-based approaches, incrementality testing answers the crucial question:

Did our ad truly influence the customer’s purchase decision, or were they going to make that purchase anyway?

Now that we understand why incrementality testing is worth the hustle, let’s delve into the methodology and uncover a step-by-step guide to run your first incrementality test.

Incrementality testing — Under the hood

Incrementality testing operates as a precise and controlled experiment, widely acknowledged as the gold standard in marketing measurement. It’s the trusted method for unveiling the true cause-and-effect dynamics.

Potential customers are split into two homogeneous groups: those who see an ad (treatment) and those who don’t (control). This division allows us to compare behaviours since both groups are identical in every aspect, apart from the marketing intervention. If the ad-exposed group shows an increase in desired response — such as making purchases — we can confidently and quantifiably conclude that the marketing created an impact.

A diagram showing potential users divided into 2 groups; “control” on the left, and “treatment” on the right. — Visualising incrementality testing

Challenges in implementation

Conducting a user level test for offline advertising like billboards or TV poses a challenge because there is no sensible way to partition individual users into test and control groups. For digital advertising, this is also no longer a sustainable approach due to privacy regulations like iOS 14 updates and GDPR rules. Shifting online consumer behaviours, such as using multiple devices and private browsing modes, add further complexity.

Unlocking the solution with geo testing

To overcome these challenges, geo testing has emerged. It involves breaking down a market, like a country, into subregions or ‘geos.’ Geos act as our experimental units, simplifying the complexities of individual user data by offering aggregated insights from each geographic region. This approach provides resilience against privacy advancements and creates a solution for both offline and digital media to be united in their measurement approach.

Building blocks

Pre-test

Step 1: Define geos

The first step is defining the geographical areas, or ‘geos.’ These areas can range from cities and states to entire countries. We can adopt common industry standards like Designated Market Areas (DMAs) or Functional Urban Areas (FUAs). In the U.S., there are 210 DMAs covering the entire country, determined by the Nielsen Company.

Successful experimentation requires consideration of the following geo factors, particularly where industry conventions like DMA and FUA are not available.

Grain size: Striking the right balance in the size of these geos is crucial. If too small, there’s a risk of users moving between geos, potentially contaminating test results. Conversely, if geo grain is too large, the experiment might become costly, and the insight less precise.
Accurate geo deployment: Ensure ads reach the right audience by displaying them in the appropriate locations. Marketing partners need the capability to target their ads in only the required geo areas.
Data availability: It’s essential to measure the response metric at the defined geo grain. This requires having sufficient data granularity in internal datasets. Additionally, the geo definition for our marketing partners needs to align with our internal data sources.
Business strategy: Geo choices should align with the business intuition of the company. There is minimal value to gain from testing in a region that would never otherwise be chosen to advertise in.

A map of the US, showing all of the DMAs (Designated Marketing Areas) — U.S. DMAs (Designated Marketing Areas)

Step 2: Design test

This step involves carefully picking a subset of available geos and strategically assigning them to treatment and control groups. Methods such as randomisation, stratification, systematic sampling, or a combination of these approaches can be used. The choice depends on specific needs and the preferred analysis method, with a primary focus on ensuring the assignment is free from bias.

While random assignment is a common practice, geo tests present unique challenges. The limited number of experimental units and the natural diversity of individual geos may complicate direct comparisons. In response to these challenges, analysts may benefit from a slightly more sophisticated approach where, instead of comparing treatment group with the control group, a synthetic control is created by utilising various statistical modelling techniques, where control geos serve as primary covariates, forming the basis of a robust test design.

Step 3: Validate the test design

Before beginning the test, robust validation of the test design is crucial. This involves comprehensive statistical checks to ensure accuracy and assures that all potential biases are identified and either dealt with or removed. Key assessments include, but are not limited to, similarity in characteristics of test and control geos, which encompass demographic, behavioural, and other relevant factors, as well as model validation, such as the correlation between test and synthetic control, KPI bias evaluation, detailed analysis of residual charts for performance insights, and the Durbin Watson test for autocorrelation of errors.

This multifaceted approach establishes a solid foundation for a reliable and unbiased test design.

Other considerations for designing geo tests:

Statistical power analysis: It’s essential to predict the precision that the experiment aims to measure. This prediction is connected to statistical power, determining the likelihood of measuring an effect of a certain size or larger. This is crucial to planning test budgets and managing expectations on the likelihood of a statistically significant test result.

Testing period: Geos are prone to seasonal variations which may impact results. It is important to factor in the test run-time and the seasonality within each region, e.g., school holidays for a travel firm, Black Friday for retail businesses.

Representativeness: We need a test to represent the total market that it’s in so that the result can be used broadly afterwards. It may help to disperse geos across a country, but other metrics should be bespoke to how a business segments their customer base, e.g., mobile vs desktop, new vs existing customers, age demographics.

Test

Apply the marketing intervention

After finalising the design, we apply the marketing and activate ad campaigns in treatment areas. While we expect an increase in ad spend, it’s yet uncertain if this change will significantly impact the response metric, such as sales.

While the test is running, we continuously monitor business performance, collect relevant data, and promptly handle any arising issues. It’s prudent to regularly confirm that the control areas are not compromised by unintentional ad spend.

An example of geo testing in US — Example Geo Test in the US

Post-test

Reconvergence

As the geo test concludes, our attention turns to understanding the aftermath. Even though the advertising spend reverts to zero, the associated impact doesn’t always vanish instantly — sales may continue to accumulate. Analysing this cooldown period is vital, capturing these delayed effects.

Upon completing the test, we confirm that the model reconverges, i.e., test and control are equal once again. This additional check indicates that the model hasn’t decayed, ensuring the relationship between the test and control groups has remained throughout. Thus, any lift measured during the test can be entirely attributed to the marketing activity. We calculate a lift estimate and confidence interval as the test result.

Calculating the lift estimate

In estimating the counterfactual time series (representing what would have happened without the intervention), we analyse the differences between the observed response metric volumes in the treatment group and the potential outcomes. Since these potential outcomes have uncertainty, our estimates also carry a degree of uncertainty.

The cumulative causal effect at any given time during the test period is the sum of these differences, starting from the first day of the intervention. However, understanding the efficiency of a marketing change also involves considering the cost associated with generating this change. Incremental Return on Advertising Spend (iROAS) compares the cumulative causal effect on a response metric to the cumulative causal effect on marketing cost. This helps us gauge the overall impact.

Sensitivity analysis

Sensitivity analysis is crucial in testing because it helps identify how variations in modelling techniques and geographic factors, such as location or demographics, impact test results. This ensures a better understanding and robustness in the decision-making processes. This includes but is not limited to exploring different statistical modelling techniques to ensure that findings remain consistent across variations.

Multiple KPIs

Geo tests are costly both in dollars spent and in data science resource used. To maximise learnings, we extend beyond a singular Key Performance Indicator (KPI) where possible. Apart from transactional KPIs (e.g., revenue, profit), the same test may yield insight into customer volumes, mailing list enrolments, app downloads, etc. This allows us to glean insights into the holistic impact of the marketing action.

Geo testing tool: EGGX

At Expedia Group, we leverage a custom in-house R-package, ‘EGGX’ (Expedia Group’s Geographic eXperimentation). This package streamlines the geo selection, treatment and control group allocation and synthetic control build. EGGX also performs statistical checks for bias and calculates the resultant KPI lift.

Leveraging our internal package for geo tests offers a range of benefits, including speed, alignment, standardised results format, replicability, and efficient data handling, providing a robust and user-friendly solution.

A spikey line graph showing the trend within tests geos. — How geo testing look in practice

Strategic implementation of test outcomes

At Expedia Group, with multiple brands and global presence, we have an extensive array of testing combinations. However, due to the resource-heavy nature of geo testing, our test results are strategically integrated with other signals, before making marketing capital allocation decision.

Incrementality testing is a vital component of the marketing measurement working alongside other measurement methods. Each method has its strengths and limitations, and they validate, calibrate, and enhance one another. This integrated approach ensures comparable and scalable measurement across all marketing channels, facilitating informed decisions in allocating capital.

Conclusion

How do you decide where to spend your marketing dollars?

How do you determine the sales that were truly driven by marketing action?

Did your ad truly influence the purchase decision, or was the purchase inevitable?

After exploring the intricacies of incrementality testing throughout this user-guide, we hope you have gained insight into how to address the questions above and how to maximise returns on your marketing investment.

We encourage readers to share their experiences, insights, and questions in the comments below. Let us collaboratively elevate the art and science of marketing measurement.

Acknowledgements

Shout out to Incrementality Allocation Analytics team at Expedia Group, in particular, Jasmine Coll, Fred Fisher, and Aiste Luksyte for their valuable input and support.

References

Estimating Ad Effectiveness using Geo Experiments in a Time-Based Regression Framework, by Google
Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program, by Alberto ABADIE, Alexis DIAMOND, and Jens HAINMUELLER
Inferring Causal Impact using Bayesian Structural Time-Series Models, by Google.

To learn more about geo testing, explore detailed insights in this medium blog post titled “Market Segmentation for Geo-Testing at Scale”.