My earliest exposure to the notion of using experimentation to guide product development was in 2005 when Ray Ozzie, Microsoft’s Chief Technical Officer at the time, wrote the now famous The Internet Services Disruption memo. In the memo, Ray talks about how “web is fundamentally a self-service environment, and it is critical to design websites and product ‘landing pages’ with sophisticated closed-loop measurement and feedback systems… This ensures that the most effective website designs will be selected…”. I didn’t realize it at the time, but I had forever connected the idea of the ability to run quick experiments to seeing the Darwinian product evolution in a time-lapsed video. As a development lead working in Microsoft’s Developer Division, this was a powerful concept to me, that emphasized how we could dramatically increase (compared to packaged software) the speed at which we could test and reject weaker hypotheses and evolve towards the “fittest” product.
Fast-forward 15 years, and it’s amazing to see how data-driven decision making through experimentation has fundamentally changed the software engineering landscape. This is a rapidly evolving area, where new advances are bringing us closer to understanding our customers every single day. This is especially relevant at a company like Thumbtack, which seeks to revolutionize a trillion-dollar industry that so far has largely been dependent on word-of-mouth. Unlocking this massive industry is operationally complex. The local services industry is highly fragmented — local service professionals offer a huge range of services that vary greatly in terms of price, availability, and expertise. Similarly, on the demand side, there are a number of factors that determine which professional a customer would like to hire. As a result, data analysis and experimentation are some of the most important ways we can continually improve the product experience for customers and professionals, and strengthen marketplace dynamics. The better we get at it, the better we can serve both sides of the marketplace, and the faster we grow.
In this post, I will focus on a couple aspects of experimentation at Thumbtack. First, I’ll share a couple examples that demonstrate how we use experiments to guide product evolution, and the complexity inherent in running these experiments. I will then discuss how we go about building an amazing culture that’s crucial to running effective, successful experiments.
A textbook example
One of the first examples I’d like to talk about is a great introduction to how we innovate. The basic idea is to give our customers more choices in their process of selecting a professional that’s right for them. I love this example, because it’s such a classic way to understand how a great experimentation framework guides product evolution. In fact, the idea for this initiative came from a Make Week project. Make Week at Thumbtack is a week-long period where individuals across all functions get to pitch ideas, form groups and try to implement prototypes of their favorite initiatives.
In this case, one of our engineers wanted to give customers the option to search for pros by filtering based on specific keywords within reviews. The hypothesis here is that we could increase engagement by enabling customers to refine their search based on specific criteria. Imagine you wanted to hire a birthday party caterer that specializes in preparing food with dietary restrictions, or an appliance installer that specializes in particular brands. Keyword filtering could help customers quickly identify professionals who’ve done similar jobs elsewhere.
Having identified the customer problem and a potential solution, our awesome team was able to quickly put together a prototype, and make a pitch that got them recognition at a company-wide all-hands. They then went about creating an experiment design that would help us detect improvements in various metrics such as engagement, conversion rate, and impact on revenue. This is an example of a simple A/B test, and we used our experiment assignment service to bucket users into control and variant buckets. You can read more about the details of how we do it here. Additionally, a part of this process is for us to define “ship” criteria. An example of a ship criterion could be, say, at least 0.4% improvement in a metric that measures the rate at which visitors contact professionals. A predefined set of ship criteria helps keep us honest when we analyze the result of an experiment and determine if an initiative should be released to the entire marketplace.
In this particular instance, this feature met or exceeded our ship criteria, and we eventually decided to make this initiative more broadly available to the marketplace. So the upshot here is that a good experimentation framework allows us to quickly and cheaply validate product-market fit, and it helps filter the option space, fast-fail on sub-optimal choices, and make data-driven decisions.
The second example I’d like to touch on is how we use experiments to inform dynamic pricing. We’ve always believed that how much a professional pays to get connected with a customer for a new project should be determined by market forces. This implies that we use data-driven strategies that lead to a fair pricing system. We believe that the best pricing system is one that optimizes for a healthy marketplace, where professionals are high quality, and customers are high intent. However, coming up with prices that meet this ideal is a complex process.
There are a few factors that make this a unique challenge. The first is that it takes a while for us to detect how professionals will react to price changes. The data maturity time is usually in the order of several months because professionals sometimes take a while to change their behaviors based on changed prices. Secondly, communicating pricing changes is not straightforward. We certainly don’t want to overwhelm our professionals with constant price changes. This means that there’s a natural limit to our experimentation velocity. Finally, given our breadth in 500+ categories there are many verticals where we just don’t have enough data to come up with statistically significant results. What all this means is that in order to come up with higher velocity with greater confidence, we have to become creative.
There are several approaches we have taken to address these kinds of concerns. They include coming up with great proxies that give us approximately the same information, but faster, investing in building great tools for data analysis, becoming smart about maximizing what we can learn from offline simulations, or getting really good at user research. The key take away with this example is that we often run into situations where experiment design and hypothesis validation is very expensive, or complicated because of market characteristics. And in those cases, we have invariably chosen to invest in taking the time to run the experiment, or coming up with reasonable alternatives.
Building a Culture of Experimentation
To quote an oft-repeated cliche: “Culture eats strategy for breakfast”. So for this next part, I’d like to focus on a broader aspect of our culture of experimentation. At Thumbtack, at any given moment, there are dozens of different experiments and accompanying variants running. Consequently, some of our end users might be exposed to a different set of experiences than others. Inherently, our default for shipping new features is to first build a hypothesis, design the experiment, validate the result, and ship only when we are confident of the result.
While this is great for us to get data, it comes with a lot of complexity. First, as a marketplace, there’s rarely a feature that doesn’t have secondary or tertiary impact on all the other features, and it becomes critical for us to factor those in when evaluating the end result. Of course, as far as possible, we are careful about how we sequence and prioritize different experiments to minimize the interference effects.
The second aspect that we tend to be very careful about is recognizing the impact of experiments, especially on the pro side of the marketplace. Pros rely on Thumbtack to be a neutral platform that’s a large source of customers for their business. They work hard to compete, and we avoid experiments that could hurt their business. Finally, all of this requires a lot of collaboration and communication across different organizations. Especially when communication lines span Product, Engineering, Go to Market, Marketing and Sales teams. We have to plan for how we communicate with our end users so that they are aware of the changes.
Tools to the rescue
Next, I would like to talk about some of the tools we’ve developed internally to help bound some of the complexity. First, we’ve built a tool called Prospect — our go-to tool where we can see all the past, current and future experiments. It shows when the experiment is expected to start and end, the core hypothesis, links to internal docs, and most importantly, the current status. It’s incredibly helpful to get a quick snapshot of everything that’s going on across the company. Next, we have an offline simulator tool that replays all the requests on past data and allows us to synthetically change some of the requests and data to do what-if analysis. We rely on it heavily to validate some of our hypotheses, and to test the robustness of outcomes. Finally, to test the results of our ranking experiments, we have built a side-by-side tool where we can see the results of searching for a pro in a given market for the baseline and the variant side by side.
Learning from failure
Finally, I wanted to touch on an aspect that I absolutely love about Thumbtack: it favors an environment where successes are rewarded, but failures are seen as bets that we can learn from. This is so critical that we spent a lot of time to ensure that it becomes part of our DNA. Right from when we hire new employees to how we write Requests for Comments (RFCs are documents where we open-source feedback and gather questions from across the organization), create training videos, and post wikis, the experimentation philosophy is ingrained in our day-to-day tasks. At the end of the day, experimentation allows us to super-charge our growth, but it comes at a cost. And we would not be able to do it without a strong culture to complement it.
If building software using data-driven insights through experiments sounds interesting to you, come join us! We would love to have you aboard.
Thumbtack (www.thumbtack.com) is a local services marketplace where customers find and hire skilled professionals. Our app intelligently matches customers to electricians, landscapers, photographers and more with the right expertise, availability, and pricing. Headquartered in San Francisco, Thumbtack has raised more than $400 million from Baillie Gifford, Capital G, Javelin Venture Partners, Sequoia Capital, and Tiger Global Management among others.