Evolution of Search Ranking at Thumbtack

Photo by Marek Piwnicki on Unsplash
Fig 1. Customer searching for house cleaners in their neighborhood

Problem Context

Fig 2. Search ranking phases in 2020
  1. Candidate Selector: Chooses the initial set of professionals in a category
  2. Candidate Enricher: Enriches the set with information from other microservices e.g., Available budget from the Budgets service
  3. Candidate Filterer: Filters the set based on filtering policies, e.g., filter out professionals who aren’t targeting customer specified preferences
  4. Featurizer: Compute or retrieve features needed by the ML models
  5. Simple Ranker: Predict the relevance of each professional to the search query using a simple ensemble of logistic regression models and order based on relevance
  6. Policy based Re-ranker: Re-rank the list of professionals based on certain marketplace policies

Challenges

A. Experimentation Process

B. Model Complexity

C. Position Bias

Evolving Experimentation Processes

Fig 3. Features for a customer’s search request

Customer Features

Pro Features

Request Features

  1. We coupled this with evangelizing the uncertainty of ranking experimentation and embracing the fact that only 1 in 3 or 1 in 4 experiments may succeed. This meant evangelizing the idea of systematically exploring our hypothesis space as we rapidly tried new ideas.
  2. We created goals around experimentation velocity for the team as a whole, e.g., 6+ ranking experiments every 6 months hoping 1 or 2 might succeed rather than hoping that specific experiment ideas will succeed.

Evolving Modeling

Fig 4. Search ranking phases in 2022
  1. We needed to understand the latency bounds our models need to operate within. To address this we ran simulated latency tests by artificially introducing latency to understand the effect of higher latency on customer experience and thus learned the latency threshold we had to operate under.
  2. We needed a more scalable & efficient modeling framework that could deal with more complex models. For this we ran performance tests with XGBoost and immediately saw significant gains in inference latency, leading to us productizing XGBoost right away and creating headroom around adding further model complexity.
  1. As we used model explainability techniques like Partial Dependence Plots and SHAP, we noticed model output changes weren’t more linearly correlated with certain numeric features in ways that aligned with the expected product experience. We addressed this by imposing monotonic constraints on those features and quickly noticed performance gains.
  2. Feature selection over 200 features wasn’t going to be easy, so we started with Recursive Feature Elimination. But Boruta SHAP, a robust feature selection strategy that uses the Boruta feature selection algorithm with Shapley values for calculating feature importance generally outperforms other methods. It has now become our go to feature selection strategy.
  1. Measure: In late 2020, we created an adapted version of the RandPair algorithm [2], so we could randomly swap a few search results and estimate the effect of the position bias within our platform.
  2. Mitigate: In 2021, using the position bias estimates, we created a set of features that calibrated for position bias (e.g. click rate) which led to small performance gains for our model. We also investigated and modified the loss function for XGBoost using Inverse Propensity Scoring [3], which led to a pronounced improvement in performance.
  1. Offline evaluation: Surprisingly, during offline evaluation though the DCN V2 outperformed our XGBoost based baseline on log loss by 0.7%, it didn’t improve other key metrics like NDCG or MRR. Though this would have normally given us some pause, we decided to productize the model for online testing since it would also serve as a way to build a foundation for serving neural models in production.
  2. Online evaluation: A/B testing DCN V2, we demonstrated that it could lift global conversion by 0.5% when added to our ranking ensemble. This led to us successfully productizing our first near state-of-the-art ML model for search ranking at Thumbtack!

Conclusion & Future Direction

Acknowledgement

References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store