Leveraging Switchbacks: Enhancing Market-Based Testing at Thumbtack

Dhananjay Sathe

Published in

Thumbtack Engineering

7 min readJul 18, 2023

Author(s): Dhananjay Sathe & Wade Fuller

Introduction

Thumbtack is a technology leader building the modern home management platform. As we continue to innovate and improve our platform, we rely heavily on experimentation to make data-driven decisions for improving the experience for our consumers and professionals.

Among the various experimentation techniques, market-based tests (MBTs) have become indispensable for gathering valuable insights and making data-driven decisions, especially in scenarios where shared resources impact both the experiment buckets (baseline and variant) and we cannot use visitor-randomized A/B tests. MBTs also come with some limitations. In this blog post, we explore the concept of switchback MBTs and how Thumbtack has leveraged them to overcome these challenges, enabling more accurate measurements and effective experiments.

Understanding the limitations of visitor-randomized A/B tests

A/B tests are regarded as the gold standard for experimentation, and the most common flavor among them is a visitor-randomized test. In a visitor-randomized A/B test, we split the visitor population into two groups: a control group (baseline) and a treatment group (variant). The two groups are then exposed to different conditions, allowing for a comparison of resulting metrics based on the conditions. However, visitor-randomized tests have certain limitations that make them unsuitable for every scenario.

Consider a situation where we want to test a new feature that is designed to more efficiently utilize the budgets of service professionals over a week. In this scenario, running a visitor-randomized A/B test may not provide accurate results. This scenario presents a challenge where the budgets for professionals are being accessed from both the variant and baseline groups simultaneously. This effect is known as cross-bucket interference or spillover. In cases where the impact of the treatment is not isolated to one group, the results of an A/B test may be misleading or inconclusive.

Market-based tests as an alternative

To overcome the limitations of visitor-randomized A/B tests in certain scenarios, market-based tests (MBTs) can be used as an alternative approach. MBTs involve selecting specific markets (at Thumbtack markets are defined as clusters of category-zip code pairs, e.g. plumbers in the Atlanta area), and consistently designating them as either variant or baseline throughout the experiment.

Sample Market-Based Test bucket assignments across USA

MBTs provide measurable outcomes in a defined market, making the effects more detectable than other experiment designs.

However, MBTs also have some limitations:

1. Longer experimentation time: The need to observe treatment effects in markets over an extended period means that MBTs typically require more time to yield results compared to visitor-randomized A/B tests.

2. Potential confounding factors: Ensuring comparability between baseline and variant markets can be challenging, as external factors or market-specific dynamics may influence the results.

In summary, market-based tests are an alternative approach to evaluate the impact of changes or interventions when visitor-randomized A/B testing is not suitable. However, careful experiment design and consideration of confounding factors are essential to ensure valid and interpretable results.

Introducing Switchbacks as a solution

Switchbacks refer to a technique used in market-based tests where markets are switched back and forth between the baseline and variant groups multiple times during the experiment [1]. The purpose of switchbacks is to overcome the limitations of traditional MBTs by providing a more robust and accurate measurement of the treatment’s impact.

A. How switchbacks help overcome MBT limitations:

Switchbacks help overcome the limitations of traditional MBTs in several ways. Firstly, by allowing markets to experience both the baseline and variant conditions multiple times, switchbacks reduce the confounding factors caused by concurrent access to budgets or resources. This enables a more precise measurement of the treatment’s impact by isolating it from other variables.

Additionally, switchbacks help mitigate the influence of temporal effects. By repeatedly exposing markets to both conditions, any temporary effects or learning effects are averaged out, providing a clearer picture of the treatment’s long-term impact.

B. Advantages of using switchbacks, including better minimum detectable effects and reduced time requirements:

Using switchbacks in MBTs offers several advantages. Firstly, switchbacks enable better detection of minimum detectable effects (MDE). MDE refers to the smallest effect size that a test can reliably detect. By reducing confounding factors and averaging out temporal effects, switchbacks increase the sensitivity of the test, allowing for the detection of smaller effect sizes that might have been missed in traditional MBTs.

Through the reduction of the MDE, switchbacks allow us to reduce the time required to conduct a market-based test. Since markets are exposed to both conditions multiple times, switchbacks eliminate the need for lengthy pre-test and post-test phases associated with traditional MBTs. This results in a 50% reduction in experiment time.

C. Overcoming balancing challenges and mitigating regional shocks:

Balancing challenges arise when trying to allocate markets evenly between the baseline and variant groups in a traditional MBT. Switchbacks can help overcome these challenges by dynamically balancing the allocation during the experiment. By continuously switching markets back and forth between the groups, the impact of initial imbalances or variations in participant characteristics is minimized.

Switchbacks also help mitigate regional shocks or variations in user behavior across different regions. By exposing markets to both conditions multiple times, switchbacks account for regional differences and provide a more accurate assessment of the treatment’s impact across diverse user segments.

Implementation of Switchbacks at Thumbtack

At Thumbtack we have enabled configuring weekly switchbacks in the experimentation framework, streamlining the process and enhancing the reliability of experiments. The implementation process involved evaluating different infrastructure options and switchback designs. The two switchback designs considered at Thumbtack were:

A. Randomized Switchback MBT

Randomize the assignments for each market at the beginning of each week of the experiment e.g. you can flip a coin each week to determine if each market is in baseline or variant. This design allows the practitioner to avoid bucket pre-balancing but comes at the cost of experiment power by introducing variance via randomized assignment. In the context of Thumbtack’s specific market clusters, this design lowered our overall key metrics MDE by about 20%.

B. Checkerboard Switchback MBT

Predetermine which markets will be assigned to baseline & variant at the start of the experiment and then implement a switchback by reversing (literally switching back and forth) the assignment for baseline and variant. This design requires bucket pre-balancing but produces a more detectable treatment effect, lowering our MDE by about 20%.

Success stories at Thumbtack

The Marketplace Matching team recently leveraged this experiment design to test a feature focused on optimizing the weekly budget spent across a given market (ie. House Cleaners in Denver). The team was most interested in measuring outcomes related to the market itself in addition to demand metrics (ie. search to project conversion) and supply metrics (hires per pro).

For reasons mentioned in the previous sections on the limitations of different testing methodologies, various experiment designs introduce tradeoffs. A visitor-randomized approach to this experiment would have given us the ability to measure demand outcomes like conversion but blocked us from making inferences about the aggregate budget utilization in a given market (since pros would be exposed to both visitor buckets). Similarly, a pro-randomized approach would have allowed us to observe the effects of the treatment on individual pros at the cost of learning about the feature’s effect on factors like demand metrics in the overall market.

The switchback experiment design provided the right balance of inference quality and experiment speed, ultimately giving us the confidence to ship this market-optimizing feature.

Future possibilities and extensions

Looking ahead, there are exciting possibilities for further leveraging switchbacks at Thumbtack. This section explores potential extensions. These advancements hold the potential to enhance experimentation capabilities and drive continuous improvement at Thumbtack.

a. Hourly Switchbacks: We are exploring the viability and advantages of leveraging this switchback methodology at smaller time-grains such as hourly or daily. We implemented these on a weekly time grain to align with our weekly budget cycles, but other features may not have this specific limitation.

b. Adaptive Switchbacks: We are researching the integration of adaptive switchbacks, where we automate the initial assignment of markets based on market characteristics and performance. By utilizing machine learning algorithms, we can aim to automatically identify market clusters and assign buckets accordingly, optimizing experiment outcomes in a more data-driven manner.

By pursuing these future possibilities and extensions, We aim to further enhance our experimentation capabilities, improve consumer experience, and drive continuous innovation on our platform.

Conclusion

Switchbacks have emerged as a vital tool for conducting MBTs at Thumbtack, enabling more accurate measurements and effective experiments. By using switchbacks, Thumbtack overcomes the limitations of traditional MBTs and achieves better experimental outcomes. The implementation of switchbacks has streamlined the process, providing a reliable and scalable framework for conducting market-based tests. The adoption of switchbacks has laid a solid foundation for Thumbtack’s data-driven decision-making and experimentation culture, and these advancements promise to take us to new heights.

Acknowledgment

We would like to thank Michel Anthony for his contributions that helped us adapt switchbacks at Thumbtack. Also a special thanks to Navneet Rao for his leadership around marketplace experimentation and his feedback on this post. We would also like to thank the entire marketplace matching team at Thumbtack for their support throughout the project.

References

[1] David Kastelman and Raghav Ramesh. “Switchback tests and randomized experimentation under network effects at Doordash” DoorDash Engineering Blog, 2018.