Hybrid Search: Building a textual and visual discovery experience at Pinterest
Matthew Fong | Pinterest tech lead, Search Features
As a visual discovery engine, Pinterest helps you discover ideas even when you don’t know what you’re looking for, or have the words to describe it. Using a mix of technology (including visual search) and community input (thanks to the 175B+ Pins people have saved), we’re able to show results tailored to your taste based on the overlapping interests of other Pinners. We’re constantly asking ourselves how we can take people from inspiration to action.
Imagine you’ve found an image of a set of art prints that you really enjoy. If you wanted to design a room inspired by this art, where would you start? It’s hard to formulate this intent in a traditional keyword search query. Finding similar images to the one you have found also doesn’t address the intent to find room design ideas.
Here I’ll share our approach to deliver visual inspiration on Pinterest. Given a keyword search query and a Pin image, the system can produce relevant, diverse and inspiring results. For example, with the query “room ideas” and the art image above, we’re able to return ideas for rooms and nurseries with similar art on their walls.
We have a name for this new feature: hybrid search, as it’s a hybrid between traditional text search and image search. We’re rolling out this technology across multiple surfaces in our product and are continuing to look for new ways to apply it. This post will detail the technical details and product explorations of hybrid search.
Many things have changed since our latest update on search ranking. This post will focus on the major differences between our normal text search and hybrid search.
This is a high-level view of our scoring architecture for ordinary Pin search with just a text query. There’s a three-step process — retrieval, lightweight scoring, and relevance scoring. The purpose of the lightweight scoring phase is to cull the result set to select the approximate “best” candidates to enter the more computationally expensive relevance scoring phase. In the final phase, we build a global model that optimizes for relevance to the query.
In hybrid search, we still leverage text-based retrieval, but we see significant differences in lightweight scoring and relevance scoring. We use embeddings to represent the input Pin(s) and result Pins — more specifically, the PinSage embeddings described here. In lightweight scoring, we send the closest retrieved Pins by hamming distance to the relevance scoring round, where we take a linear combination of the GBDT relevance score referenced earlier, along with the hamming distance. This gives us a result set that still very clearly has high textual relevance to the query, but also incorporates visual and thematic elements of the input Pin(s).
We went through several iterations of lightweight and relevance scoring before we achieved the best results. Initially, we tried to keep much of the original lightweight scoring formula intact, but we found that we were filtering out the Pins most similar to the input Pin. We decided that our text-only retrieval process would be a sufficient filter at the text level, so we shifted the first round scoring to only look at the input Pin + candidate Pin similarity.
For the final scoring phase, we vary α and β depending on the application that we are applying hybrid search to. If we are more confident in our selection of the input Pin, we will have a higher β, whereas if we are doing an implicit personalization that is hidden from the user, we will have a higher α instead.
The story of a colorful chair
Suppose the Pinner wants to find home decor that can pair with this delightful example of a chair, without any knowledge that it even has a specific name (it’s called a Mondrian chair).
If the she looks at the visually similar Pins, the feed of results is mostly chairs, but none that really captures the intent of the original and nothing that could help fill a room around this item.
However, with hybrid search, she could find that there’s all sorts of decor that match well with this chair. Using the input Pin above and the queries bookshelves, walls, and even clocks, we’re greeted with a plethora of results. We’ve run some experiments on augmenting the original query with text signals from the input Pin(s), and this is just one example in which it performs exceedingly well.
Product application: Personalized search suggestions
The initial use case for hybrid search was our search recommendations that we insert in the home feed (the user’s personalized feed of Pins). Once we synthesize all of this information and narrow it down to the top suggestions, we pick a cover image for each recommendation. This cover image is actually the image of one of the Pins in which the Pinner has interacted.
Upon clicking on one of those tiles, the Pinner is brought to a feed of Pins that utilizes hybrid search. You can see that there’s a stark contrast between just searching for the text query, as compared to searching the with text query and the context of all of the Pins. For instance, look at the example of ‘audio room’. I’ve recently been looking at Pins that show different examples of vinyl record storage and display.
On the left, we see the normal search results, which focus more on large speaker set-ups. This is likely the most popular global intent when searching the query. On the right, we see the personalized search results — we still see content about audio rooms, but it’s clearly skewed towards vinyls and record players. For this particular user and query, we’re able to show results that are much more inline with their past actions on Pinterest.
As in this use case, we’ve found that utilizing hybrid search to personalize search results improves the user experience. We’ve experimented with feeds of just the normal search results as well as results from Related Pins, and hybrid search results significantly outperform both. To put it into context, we have a measurement of search success that involves a variety of actions on the results page, as well as downstream of the results page. For the average search, that rate is typically around 30% (based on internal data), while for these personalized search suggestions, it comes in at around 40%.
We’re currently running experiments to better understand which other use cases this new technology is best suited for. As we start to ramp up our search personalization efforts, this will be a critical part of those efforts. And of course, we’re also continually working on improvements to the quality of these results, and unifying the infrastructure with our ranking architecture for normal text search. If these problems and challenges interest you, take a look at our open positions here!
Acknowledgements: Lulu Cheng, Ying Huang, Randall Keller, Yixue Li, Zheng Liu, Haibo Lu, Jimmy Luo, Yanis Markin, Rajat Raina