Building multi-category search results

Richard Demsyn-Jones
Thumbtack Engineering
10 min readAug 5, 2020
Photo by Daniel McCullough on Unsplash

Thumbtack helps customers find skilled local professionals that can help get their jobs done. Whether you need to find a house cleaner, interior designer, electrician, personal trainer, or math tutor, Thumbtack has professionals ready to help!

Search at Thumbtack has recently undergone an evolution. Previously visitors had to select a specific category of job, like Fuse Box Repair or TV Mounting, before being shown professionals from that category only. This can be limiting when you want a broader list of professionals or when you don’t quite know what you should search for. You might want to see all plumbers instead of picking between Sink or Faucet Repair and Plumbing Drain Repair first. We recently improved our product to let visitors search in a much broader vocabulary, resulting in search results that include professionals from a variety of applicable categories. This was a massive undertaking, but it paid off with a demonstrably better customer experience. At the end of the day we want more customers and professionals to get jobs done together. By showing customers more applicable and better-ordered professionals in search results, multi-category search led to more jobs done.

How does Thumbtack categorize jobs?

To explain the usefulness of multi-category searches we first need to understand what categories are at Thumbtack and how those relate to customer searches.

On Thumbtack we have thousands of skilled professionals, all of whom are a little bit different. Thumbtack defines categories to classify the services provided by our professionals. Categories include popular services like House Cleaning, Handyman, Dog Walking, and Accounting, and specific abilities like Air Quality and Environmental Testing, Fitness Equipment Assembly, and Radon Mitigation. I don’t know what that last one means, but if you have some radon to mitigate then we have professionals to help.

A small subset of our many home categories
Small subset of our home categories

We actually have hundreds of categories, with a full list on our services page. If you use our customer app or browse our website, we recommend categories to you while searching. Our professionals initially sign up for categories, and then specify their skills within each category. Categories are the main unit of organization for our professionals, following in the tradition of directories like the Yellow Pages. When visitors use our app or website, the search bar and category icons take them to lists of professionals in their area. Traditionally, those lists only contain professionals from a single category.

The need for better search

Our path to multi-category searches started in the search bar. Our search bar contains suggestions that update as visitors type. Those suggestions used to come exclusively from the list of categories on our platform, which is a relatively small list. Visitors benefit when we expand the vocabulary of our search bar.

A motivating example is electrical work. We don’t have an “electrician” category. We have Circuit Breaker Panel or Fuse Box Installation, Circuit Breaker Panel or Fuse Box Repair, Electrical and Wiring Repair, Fan Installation, and at least six other related categories. This makes sense for our professionals, because individual electricians vary in which of these tasks they perform.

This does not always make as much sense for our customers. If I have an electrical problem, I might just know I need an electrician, and that is likely what I’ll type into the Thumbtack search bar. If I tried that a couple of years ago I would end up on the Electrical and Wiring Repair list. If I then contacted a professional in that list I would have to answer questions tailored for Electrical and Wiring Repair. A similar example is plumbing. A customer couldn’t just see a complete list of plumbers. Instead they had to choose whether they wanted to look at Sink or Faucet Repair, Plumbing Drain Repair, Plumbing Pipe Repair, Emergency Plumbing, or 15 other categories.

We also have search experiences where users know what they want but would like a broader list. Maybe you’re throwing a party (🎉🎉🎉) and want to see what entertainers are available in your area. Previously you would have to search separately for Face Painting, Magician, Balloon Twisting, and others. It would be easier to see everyone on the same search, especially if you live in a region that only has a few entertainers in each specific category.

Deciding to build multi-category search

We improved the search bar by expanding the search space. Instead of hundreds of categories, we expanded the list to include thousands of relevant terms.

The search bar now has a wider vocabulary
The search bar now has a wider vocabulary

We quickly hit a stumbling block: What is the right kind of customer experience for these new search terms? What type of experience should they lead the customer through? Our search results were based entirely on categories, under the model that every search corresponds to exactly one category. We had to figure out how to support searches that didn’t fit into our category taxonomy.

It starts with the search terms themselves. We used a mapping of terms to relevant categories. We call terms that fit multiple categories “ambiguous keywords”. On average they include around 4 categories, but can sometimes extend to 15 or 20 categories.

At first we tried sending users to the most relevant category for each term. We found engagement with the search bar was high, but the resulting categories were not always the best fit. We followed with an experience that first took users to a page that asked them to narrow down to one category. This substantially improved visitor conversion, but it wasn’t ideal. We allowed customers to express themselves in their own terms, using a bigger vocabulary than our category list, but we still slowed them down with an intermediate step. We had no support for showing professionals from different categories on one list.

That’s when we decided to break out of the category convention for visitor search and instead serve professionals from multiple categories in a single ranking on one page. In hindsight this seems obvious, but at the time it felt radical. Categories were the core unit at Thumbtack. How would we generate a candidate set across different categories? How would we create a ranked list of professionals? How would we show the different capabilities to our customers?

Technical challenges with multiple categories

We knew we had immense technical problems ahead of us, with every piece of code in our service and in our dependencies written under the assumption of a single category.

Our search system starts by fetching a wide candidate set of professionals before creating a ranked list. We updated that logic to search over a list of categories. We then gather more information about each professional from a selection of other services, including information about their budget, job preferences, and so forth. For multi-category search results we typically chose to make a request for each of the categories in the search, or to batch the requests together when the interfaces of our dependences permit it. Finally we rank and truncate the list.

During development this increased latency for list generation. We expected list generation to get slower for logic that we do once per category, because of additional categories. A search term with four categories (for example) might take four times longer, scaling linearly. We quickly noticed that lists also became slower in a second way, for logic that scales with respect to the number of professionals. For example, when we calculate features for professionals, we have to do it for every professional, and a search with four categories will have more professionals than a search with a single category. We sort professionals to create a final ordering, and sorting is O(n log n).

For the range of categories and professionals we support, we found out that the once-per-category tasks were typically sublinear in time because we could parallelize them or optimize the logic to not depend on the number of categories. To understand how the number of professionals affected performance, we performed artificial tests of identical searches with different bounds on the number of professionals. The results showed that latency increased linearly with the number of included professionals. Thus, our system became slightly slower not because of the number of categories itself, but because more categories meant we had to evaluate more professionals for each search.

Processing time increases linearly with the number of professionals

While we originally planned to reduce the number of categories to manage latency, these results changed our focus. We pursued speed-ups that applied across the board to all searches, such as reducing the amount of data we pass around for each professional. Additionally, we have plans for optimizations to consider fewer but better matched professionals.

Ranking multi-category search results

As described in a previous blog post, our search results are optimized to create the best match between customers and professionals. Match quality depends on how likely the customer is to contact each professional (based on ratings, reviews, and other features) and how likely the professional is to match that job and respond to the customer (based on preference match, distance, historical response time, and so forth). This breaks down a little bit when we introduce professionals from different categories.

  • Different categories have different average contact rates. Customers are a lot pickier about who they hire for Outdoor Landscaping and Design than they are for Lawn Mowing and Trimming.
  • Not all categories are equally relevant for each search. While some visitors who type “accountant” in the search bar are well-served by Personal Financial Planning professionals, most are better served by professionals enrolled in Accounting.

Put another way, we already had a probability of contact and a probability of response, but those probabilities assume consistent categories. We considered switching to an entirely new framework, with models that would predict at a search level rather than a category level, but wanted to avoid overhauling our models and our implementation any more than necessary.

Our formula needed to become more complex, so we decided to break it down:

  1. How relevant is each category to the search?
  2. How good of a customer-professional match is each professional for each category?
  3. How do we combine professionals who are in multiple categories?

We decided to use a variation of TF-IDF, a basic relevance algorithm, to connect search terms with a corpus of customer reviews. Reviews are tied to individual categories, so the corpus of terms that customers use reflect the vocabulary of each category. The search term “wedding food caterers” is likely to have all three words represented often in reviews for Wedding and Event Catering, while one or two of those words will be much less common in Food Truck or Cart Services.

We already had models that predict how suitable a customer-professional pairing is within a given category, so we oriented our solution around reusing those models rather than creating new ones. We chose to do something incredibly simple: we weight the pro-category probability by the category-search relevance score. Professionals get more credit if they are more contactable or more responsive, but also if their categories better align with the customer search.

Ranking professionals who are in different numbers of categories posed another challenge. We evaluated many simple algorithms, such as taking their average score or their maximum score. We evaluated these algorithms on sample data, and we also evaluated them conceptually: we thought about the incentive structure of different scores, the effects on market concentration, and our sensitivity to model residuals. For example, if we take the average score then a professional may be incentivized to stop serving their lowest performing category in order to drive up their ranking in multi-category searches.

We ultimately decided that our weighted scores could be additive: each professional gets a match probability in each category, all of those probabilities are weighted by relevance between the search and the category, and all of them are then added up. This gives professionals credit for every category they support, but only to the extent the category is relevant to the customer search.

Launching the product

We needed visual changes to accompany the ranking changes. We changed the user interface to list out categories each professional serves from the subset of relevant categories, and we have options on the list to let customers narrow down their categories. We showcase details from the top category of every professional, and now customers narrow down to a specific category once they start a contact. Collectively this gave us a very reasonable blended search algorithm shown in an experience that is clear to our visitors.

We A/B tested several iterations, as we improved the ranking algorithm and added support on more platforms. The latest algorithm led to large and significant improvements to key customer engagement metrics, and we shipped the change in early 2020. We demonstrated that with multi-category search results, we can serve much more varied experiences to visitors, tailored to their search queries. This takes us another big step closer to a truly fluid and intelligent search experience.

Projects of this scale come with multidimensional challenges. We had a conceptual challenge, requiring us to break out of our category-oriented mentality and invent another paradigm. We were challenged organizationally, requiring tight collaboration between teams. We had statistical challenges, requiring us to think creatively about how to rank intelligently without throwing away all of our existing models. We had significant technical challenges to adapt code and interfaces to serve more complicated search results with high throughput and low latency. More than anything, this project required the willingness and persistence to do something new and improve Thumbtack for our customers and professionals.

If any of these initiatives or challenges excite you, come join us!

--

--