Experimentation & Measurement for Search Engine Optimization

Leveraging a market-level approach to measure landing page effectiveness on Airbnb.

Brian de Luna
Sep 25, 2018 · 10 min read
Image for post
Image for post
Our San Francisco headquarters has it all, from comfy nooks to sunlit views. It’s the perfect place for a data scientist to brainstorm about experimentation!
Image for post
Image for post
Our SEO team focuses on making Airbnb the top result on search engines like Google.
Image for post
Image for post
Image for post
Image for post
Old Search Results Page (left) vs. the New “Magic Carpet” Landing Page (right)

Limitations of A/B Testing

Our Growth team leans heavily on iterative experimentation for nearly every product change to make sure that we can measure effectiveness and to learn as we build. Most data scientists are able to leverage a traditional A/B test at the device- or user-level for all of their experimentation needs. In this setup, users that enter the experiment are randomly bucketed into the treatment groups, and we can directly compare the outcome of the treatment group with that of the control group.

Image for post
Image for post

Leveraging a Market-Level Approach

A key realization is that our search results page isn’t just a single page; in fact, there are many different versions for different cities, towns, and regions. Each of these have what is called a unique “canonical URL”, and we actually have over 100,000 of them that are surfaced on search engines! Therefore, instead of assigning a single visitor to treatment or control, we can set the unit of randomization of our experiment to be a specific canonical URL. We’ll then measure the effect using an approach commonly used in market- or cluster-level experiments.

Image for post
Image for post

Developing a Model: Difference-in-Differences

A difference-in-differences framework is one technique that utilizes pre-experiment data to control for these baseline differences in the absence of any interventions. We can use this method to measure the treatment effect and its statistical significance by using an estimator from a linear model, where for each page i and day t:

Image for post
Image for post
  • treatmentᵢ = treatment group indicator (equal to 1 if in the treatment group, 0 otherwise)
  • post_t = pre/post-period indicator (equal to 1 if in the post-period, 0 otherwise)
  • t = time index, to account for overall time trends
  • dowⱼ = weekday indicators, to account for weekly seasonality
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Measuring Power

Before we launch an experiment, it’s important that we understand our statistical power. Since we mainly just care about the b₂ estimator, we are essentially carrying out the hypothesis test

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Launching the Experiment

Once we set up our model with the appropriate assumptions and asserted that we had sufficient power to run a test, we launched the Magic Carpet experiment and randomly released the new design to half of our landing pages. The test lasted three weeks, in which we saw a visible lift in traffic:

Image for post
Image for post
Image for post
Image for post

Final Thoughts

Approaching our SEO landing page experiments through a market-level framework has proven to be very useful for measuring effectiveness of changes to our product in terms of search engine rankings. In fact, we were able to scale this framework using our using our open-sourced Airflow scheduler to automate the analysis of over 20 experiments, ranging from sweeping design changes to small HTML tweaks.



Airbnb Engineering & Data Science

Creative engineers and data scientists building a world…