EXPEDIA GROUP TECHNOLOGY — DATA

Market Segmentation for Geo-Testing at Scale

Increasing experiment volumes and efficiency with a little preparation

Jasmine Holdsworth

Published in

Expedia Group Technology

10 min readFeb 28, 2023

The revival of geo-testing

Geo-testing, the process of using geographical locations to help businesses understand the impact of their marketing activity, is enjoying a revival. Accurately measuring the effect of digital campaigns has been affected by privacy changes initiated by Apple™️, a decline in third-party cookie data, increased usage of incognito browsing, information loss due to cross-device usage and multiple touches along the customer journey. Meta™️ considers geo-experiments “one of the best ways to quantify ad effectiveness through lift” and recently released its open-source solution GeoLift.

Geo-testing at Expedia Group

At Expedia Group™️, we perform up to 40 geo experiments per year globally. These experiments span three brands and multiple marketing channels and are essential to quantify the success of our marketing campaigns.

We have our own in-house geo-testing tool, EGGX (EG Geo Experimentation). It was created and has continuously improved since 2017. It creates a synthetic control from a group of geographical areas that best reflect the activity in the test regions. This synthetic control is then used to measure the lift that could be attributed to our marketing efforts.

Here is a simple overview of how we perform geo tests:

A line graph showing two lines that are close together during the pre-test and pre-test holdout period, diverge during the test period, and re-converge again in the post-test period. — *The stages of a geo-test within Expedia Group (illustrative).*

Pretest period Establish a geo-match
Pretest holdout period Confirm synthetic control behaviour continues to represent the behaviour in the test regions
Test period Treatment occurs in test regions
Cooldown period Observe re-convergence between test and synthetic control

Common geo-testing issues

Several pitfalls can occur while trying to set up a geo-test:

Representativeness If you apply the learnings from your test nationally, you need to make sure that your test and control regions represent the entire market in elements such as customer type, device usage and other customer behaviours.
Geographical spread Your experiment needs to be robust enough to withstand unforeseen, region-specific events such as natural disasters or geo-tagging issues. If you are running a test in the United States and an event occurs that means that the East Coast behaves anomalously, having the rest of your test or control regions spread across other unaffected areas of the country could save your experiment. Geographic spread is also essential for national representativeness.
Determining the right geo-grain Deciding the right-sized geographical regions (geo-grain) to experiment on can be tricky. Too small (e.g., city, zip code) and you run the risk of contamination from commuters or inaccurate geo-tagging. Too large (e.g., country-wide) and you run the risk of not being able to find representative controls.
Time-efficiency Geo-testing is time-consuming due to the process of designing, launching, and waiting for re-convergence before even beginning to analyse results. Additionally, geographic regions used in one experiment mustn’t be used in another until any campaign effect has “cooled down”. These considerations place a limit on the number of experiments that can be performed in a year.

With such a high volume of experimentation across so many intersections, we must solve the above issues in a standardised manner such that multiple Data Scientists can launch multiple robust experiments that yield accurate and reliable results.

How we use market segmentation to solve these issues

Within Expedia Group, we segment our markets before rolling out geo-tests to minimise the impact of the above issues. These homogenous segments are subsets of our market that are geographically spread, representative of the entire market and with the geo-grains already determined.

By defining a market into segments before the first test, you can plan your experiments anywhere between a quarter and a year in advance, moving segments between Test, Control and Business-As-Usual (BAU).

*Moving market segments into different stages throughout the year (illustrative).*

This way, geographical regions in Segment 1 can be used as test regions while the control regions are pulled from Segment 2. When the experiments are over, Segment 1 moves into BAU so that normal activity in the geographical regions resumes and the treatment effects can dissipate. Once this occurs, Segment 1 geographical regions are moved into Control where they can function as the basis for the counterfactual for other experiments.

Using this methodology, you have ready-made test regions of the correct geo-grain that are representative, control regions that are free from the contamination of other tests, and you eliminate waiting for the cooldown period to end before starting your next experiment (solving for time efficiency).

How to segment a market

The process of market segmentation for geo-testing is broken into four stages; Preparation (exploratory data analysis on the market and the regions within it), Segmentation (finding the optimal segments within the market), Validation (ensuring that the segments are representative and fit-for-purpose) and Proof-of-concept (producing dummy experiments to highlight any unforeseen issues before the launch of the market’s first experiment). This process is iterative rather than linear, and we often revisit stages making slight changes until we have the optimal results.

Step 1 — Preparation

The first task is to determine the geo-grain, or the right size of geographical location to test. This will depend on multiple factors, such as the granularity available for tracking and the size of the marketing campaign. Whether you decide to experiment at the city, region, state or country level, the geographical locations should have robust geo-tagging and be resistant to commuter contamination.

Next, observe a time series of your primary experiment KPI split by geographic region. The time series, whether daily or weekly, should reflect the grain of your experiments. Note any seasonal or anomalous behaviours observable in some geographic regions and not others and consider throwing some geographic regions out of the process. A full year of data will ensure you are aware of seasonality across the year, while a year-over-year view will help identify seasonal behaviours and those that are anomalous. This is particularly important in a post-COVID world.

Finally, create a correlation matrix of the KPI for the geographic regions. Are there any geographic regions that correlate very well with others, and some that don’t correlate with others at all? You may need to keep these top-of-mind for the next step.

Step 2 — Segmentation

Once you have your geographic regions, you need to split them into homogenous segments that have as equal a split of the KPI as possible.

In the cases where we have many geographic regions to split into segments, we use k-means clustering to create clusters of geographic regions that are similar. After experimentation, the two best features we found are “the average correlation with other geographic regions” and “the sum of KPI” (a proxy for geographic region size); however, there may be other features that would work for your specific business case. This leaves you with clusters that are similar in size and average correlation, therefore putting all geographic regions of a certain ‘type’ together.

12 circles representing geographical regions are shown to be split into three different clusters of varying size. — *Cluster your geographic regions into ‘types’ (illustrative).*

Once you have your clusters, each one needs to be split into n segments. The appropriate number of segments is more art than science and requires some intuition. Are the clusters broadly in multiples of two or three? Do you know your business need requires at least three segments? At this stage, we try various values for n and choose the value that produces the best results at the validation stage (step 3).

To split the clusters into segments, we use a proprietary Expedia Group function that takes the regions within a cluster as input, as well as the desired number of segments (n). The function tests every possible assignment of a region to a segment and calculates the lowest possible difference in the KPI between them. The output of the function is the regions within the cluster and their assigned segment. We work through every cluster systematically until each one is split into segment 1, segment 2,… segment n. Finally, all regions with the same segment assignment are grouped and each cluster (i.e., every ‘type’ of geographic region) is represented in each segment.

While the above process works well for many geographical regions, clustering a smaller number can be a more manual process — for example, a market with seven geographical regions doesn’t lend itself too well to clustering. In these situations, we have treated all the regions as one cluster and put them straight into the function for splitting. We then switched geographical regions between segments until all validation checks (step 3) are passed.

This process results in n homogenous segments of your predetermined geo-grain, of hopefully equal size and with equal splits of every ‘type’ of geographical region (therefore, hopefully, representative).

12 circles split into clusters of varying size are shown to be split into 3 individual ‘segments’. The image shows how the geographical regions, represented by the circles, can be split from the clusters in the previous image into final segments. — *Each cluster is then split into segments, such that each ‘type’ of geographical region is represented in each segment (illustrative).*

Step 3 — Validation

It’s at this stage that you need to check for the representativeness and geographical spread of your segments and perform any corrections required.

When checking for national representativeness, an indexed-time series of all segments against the entire country is essential. The time series allows you to observe any seasonal behaviours that are more prevalent in some segments but not others. If this is the case, manually swapping the geographic regions within the segments that are the main driver for the differences is recommended. Indexing the time series allows you to compare the segments with the entire country without the skew of magnitude.

Once you are satisfied with this, the next step is to check that each segment represents the market in every other dimension considered important. Two examples of these dimensions might be new customers and app users. In Expedia Group we use the following process:

Take a year of primary, secondary and tertiary KPI data per segment, aggregated up to a weekly level.
Split the above by the dimension of interest (new customers or app users in this example).
Choose a KPI, then perform a Student’s t-test for two of the segments using the absolute values of the chosen dimension as the input. For example, filter Segment 1 and Segment 2 to only new customers and then get your KPI aggregated by week. Check that the resulting p-value is non-significant (>= 0.1).
Repeat the process such that every segment is tested against every other for every dimension and every KPI.
For any statistically significant differences, some manual swapping of the geographical regions between segments will be required, and all validation checks repeated.

A table showing the results of multiple p-values which are the results of t-tests. The table shows how each dimension of interest, for example ‘new customers’, can be split out in each segment and their means tested for a difference. — An example of the output of 18 Student’s t-tests with non-significant p-values, indicating that app users and new customers are not statistically significantly different across segments (illustrative).

We use a proprietary function that takes the data as an input and outputs a data frame similar to the above image. The advantage of putting this process into a function is that you can perform multiple t-tests in seconds and reuse the function to re-check representativeness for smaller timeframes before every test launch.

There have been occasions, especially with smaller markets, where the magnitude of the segments varies significantly. In these cases, using absolute numbers as the input into the t-tests will not work as t-tests check for a difference in means. In these cases, the proportion of the KPI within that week that were, for example, new customers, can be used as input. This way the results are unaffected by the magnitude of the segments.

Once all validation checks are passed, check the spread of the segments geographically. While the geographic spread will not always be perfect, ensuring that segments are not solely represented by one area of the country or only by larger geographic regions is a safeguard against region-specific issues that can arise.

A map of the United States which has been split into three different segments of designated market area. Each segment is represented with different colours. — *An example of the United States split into three segments (illustrative).*

Step 4 — Proof-of-concept

To produce these dummy experiments, we use our internal geo-testing package EGGX. One of its functions is to build a modelled synthetic control from a specified set of control regions based on the behaviour of test regions. As proof-of-concept, we build synthetic controls for every combination of segments in the same way we would if we were to launch an experiment. Sometimes, the segmentation of a market passes all validation checks but fails when it comes to synthetic control-model quality. With such tight testing schedules, it is imperative to catch and correct this with plenty of time before your first test launch.

Conclusion

This market segmentation methodology solves many common geo-testing issues while also allowing the early identification of problematic geographical regions before your first experiment even launches. While every market is different, this framework helps to increase experiment volume and efficiency while reducing contamination, experiment overlap and some of the more repetitive elements of experiment design.

Acknowledgements

I would like to acknowledge and give thanks to the Incrementality Testing and Methodology teams whose work contributed to this post. Particularly, thanks to Simona Cadoni who conceptualised this methodology.