Search serving and ranking at Pinterest

Pinterest Engineering

Pinterest Search handles billions of queries every month. Every day, we help millions of Pinners discover useful ideas by delivering results among billions of Pins saved by people with overlapping tastes. In the early days, we built our first search system on top of Solr and Lucene. Over past few years, we’ve evolved our search stack by adding new layers, designing services and experimenting with ranking functions. Advancements in our search product have resulted in product evolutions over the years and across platforms that range from the search guides that have become industry standard, to improved results based on signals like interests and location, to visual search that uses the latest in computer vision. In this post, we’ll provide an overview of our search serving and ranking stack, and look ahead to future improvements.

Life of a search query

Image for post
Image for post

When a Pinner searches on Pinterest, the query goes from our API layer to our search backend. In the search backend, an Anticlimax service understands the query, Obelix machines find the most relevant Pins for the query and an Asterix service coordinates the returned results.

Asterix

Image for post
Image for post

Inside Asterix, there are three major components: cluster client, rerankers and blenders. The cluster client is a scatter/gather service that distributes search requests to Obelix nodes, waits for results and then merges results together. (The cluster client also retries on outliers and handles partial results and other failures.) The merged search result is then reranked based on different business logic. For example, a machine learned reranker generates a new ranking score for each Pin based on context features, while a local reranker boosts Pins in a Pinner’s language.

The search results from different clusters are blended using both simple and complex blending logic. For instance, we can use proportional blending to insert 10 percent fresh Pins into results. More complex blending logic allows us to, for example, surface Buyable Pins in results based on query intent, such as “men’s black sneakers”.

Anticlimax

Image for post
Image for post

All query rewriters are chained in a sequence, and we execute them one by one. Spell correction and query segmentation must be executed before other rewriters. Query expansion, query category prediction and other workers can switch order or execute in parallel.

We support different data sources for each query rewriter. For example, the spell correction model is stored in memory, larger dictionaries are stored as HFile on disk and query category prediction data is stored in a different service (which we query for every search).

Obelix

An Obelix server may have multiple index segments. The Pins inside each index segment are ranked according to query independent score. This score measures Pin quality, which is an important factor of the final ranking. With the static rank, we’re able to score first few retrieved Pins for most search queries and guarantee the best Pins are scored. Static rank also enables us to have complex functions for scoring.

The searcher in Obelix scans and scores multiple index segments in parallel. It maximize our CPU usage during non-peak hours and improves latency.

Search Ranking

  1. Query. Similar to other search engines, Pinterest query rewrite does spell correction, query segmentation, category prediction and other rewrites. However, a unique aspect of Pinterest search is what Pinners are searching for. On Pinterest, people issue exploratory queries for ideas versus asking objective questions. To provide diverse results, we developed a context-based query expansion. By analyzing query logs and engagement data, we extract query pairs that have similar keyword context and engaged result, and use them to construct term expansions. For example, “relief” can expand to “remedies stress” under context “anxiety”. After expanding our result, we provide guides to help user drill down specific interests.
  2. Content. Pinterest has a unique, human curated dataset constructed of Pins, boards and Pinners. We explore different signals from our content, some human readable (e.g. board titles) and some not (e.g. embedding vectors for Pins, boards and Pinners).

Let’s look at an example of work we’ve done to better understand board titles. Let’s say a Pinner saves a Pin about “Daisy Ridley at the 2016 Academy Awards” to a board called “Oscar gowns.” We extract dozens of signals, such as board title frequency, Pin topic distribution and board quality, and can then understand this Pin is not only about “Daisy Ridley” but also related to “oscar gowns” and, more specifically, “2016 oscar gowns.”

Image for post
Image for post

Personalization

There’s a lot more we can do to personalize the search experience for Pinners, and we’re exploring strategies and building plans for the future of personalized search.

Ranking

We’re also experimenting with different models including neural networks and gradient boosting decision trees. There’s a lot of ongoing work,both in terms of search quality and the infrastructure side so we can better understand content and apply complicated scoring functions online.

Conclusion

We would like to acknowledge the tremendous contributions of the search team Keita Fujii, Kevin Ma, Laksh Bhasin, Maesen Churchill, Matthew Fong, Roger Wang, Rui Jiang, Timothy Koh, Ying Huang, Yuliang Yin, Zhao Zheng, Zheng Liu, and Zhongxian Chen.

The Graph

Smart thoughts on the future of digital publishing

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store