Zhongxian Chen | Pinterest Tech Lead, Search
On Pinterest, the best answer goes beyond text, but can even go beyond an image too. When you’re looking for eye makeup ideas for example, a video personalized to your skin tone range teaching you how to create a certain look is likely more useful than a Pin showing what the end result looks like. If you’re sprucing up your backyard and looking for lighting ideas, an expert on backyard lighting to follow can be a one-stop shop for information, instead of having to browse multiple sites.
As the types of content on Pinterest grow, our search results must adapt as well. Pinners can use filters to narrow down to the content type they want, but we can make the experience even better by predicting the most relevant answers for them. As a solution, we’re building a system to deliver content from various verticals within one integrated search results page. Today when you search you’ll already see these different types of formats (Video, Shopping, Pinners to follow). Over time these results will become even more personalized and relevant through advancements in machine learning ranking,
There are three major components in this system:
- The query understanding component is responsible for detecting vertical intent based on query tokens as well as historical engagement. It also populates pre-retrieval features that are used in the blending component.
- The query planning component decides which verticals to send a request to based on detected intent and composes the requests. It also decides whether to fetch results from random verticals to generate training data for machine learning the blending model.
- The blending component is responsible for blending vertical results into the main Pin search results using a model, based on query intent and vertical result quality.
The query understanding component lives inside Anticlimax, a service developed to do query understanding and rewriting. Both the query planning and blending components live inside Asterix, which serves as the root of our search system and talks to all verticals.
Query understanding and planning components are relatively straightforward to understand, so we’ll spend more time with the blending component.
Query Understanding Component
This component decides from which verticals to fetch results. Since we have user info and the query as inputs from Asterix, we can check which verticals the user has seen recently for similar queries and what they engaged with for those queries. If they did not engage with certain verticals, we will not show more of that vertical in the near future. For all verticals that have not been fatigued after this step, we then run the query through the intent detector for each vertical. Each intent detector returns the intent (if any) with pre-retrieval features that are used in blending. Pre-retrieval features represent how strong the intent is and are used later to blend results from different verticals.
Query Planning Component
This component composes and sends fanned-out requests to corresponding verticals based on detected vertical intent(s). One goal of this component is to trigger a vertical results retrieval if and only if the vertical results will be shown to the user after blending so we don’t waste resources retrieving verticals that are not useful. For a small percentage of traffic, we want to retrieve random verticals instead of the one(s) with predicted intent. The logging from this random traffic is used to produce unbiased training data for the vertical triggering and blending models. If we use production traffic, the resulting model can bias towards the existing model, and verticals without previously detected intent (that previously contained low quality content) will never have a chance to be shown.
We could alternatively pick a small percentage of random users for whom to show randomized vertical results. However, we don’t want any users to always have randomized results since that’s not a good experience, so we decided to use the random traffic approach instead.
The above graph shows the components used in blending and how they connect to each other:
- Post-retrieval features extractor: This component extracts features from all retrieved verticals and main Pin results. We want to use the most important features representing quality and relevance of the results. For main Pin results, we will use relevance score, navboost features and popularity features. For verticals, the features are different depending on the type of content. Since most vertical and main Pin results contain more than one result, we take the minimum, maximum and average value of the result feature to be the feature representing the whole vertical. We only use the top N main Pins in feature extraction, which will be explained in the model training section below.
- Blender: This component uses both pre-retrieval and post-retrieval features with a machine-learned model to score each vertical and decide if and where a vertical should be inserted. There are some fixed slots among main Pin results where we can insert a vertical. We have a score threshold at each slot, and only verticals with a score higher than the threshold are eligible for insertion. When more than one vertical passes the threshold, we pick the one with the highest score and leave the others for the next slot. How we decide the threshold for each slot will be explained in the model training section.
- Model loader: This component loads the models from S3 to be used in blending. We support loading and using different models based on search request parameters for the convenience of experimentation.
- Feature logger: We want to log the features of each vertical and main Pin results for doing machine learning. We only log them for a small percentage of traffic due to storage constraints.
- Whole page cache: We revamped our cache infrastructure to make it capable of storing multiple types of content from the whole search results page.
Blending Model Training
Features are logged in Asterix, and we extract labels from user engagement logs. We use online engagement data to generate labels because it is more scalable compared to offline data. In addition, it can be hard for human evaluators to compare the relevance of main Pin results to vertical results. Features and labels are then joined to create training data. One model is trained for each vertical. Here are some details about the model training:
- Feature vector: (Q, Pq, Vq), where Q is pre-retrieval features, Pq is post-retrieval features of top main Pins, and Vq is post-retrieval features of the vertical.
- Labels: We create pairwise labels between a list of top N main Pins (TP) and each vertical seen by the user. We want to use top main Pins as the pivot and train a model for each vertical that predicts how relevant the vertical is compared to the main Pin results. Denote the vertical inserted at slot S0, S1, S2 as V0, V1, V2 respectively:
- If there is some engagement with V0 but no engagement with TP: label = 1, weight = 1
- If there is no engagement with V0 but some engagement with TP label = 0, weight = 1
- If there is some engagement with V1 or V2 but no engagement with TP: label = 1, weight = 2
- Abandon all other samples.
- Model training: We use GDBT and calibrate the outputted models so the score is comparable across different verticals.
- Slot threshold: We want to enforce X% of traffic to have a specific vertical at a specific slot, so we pick a threshold for each vertical. We do this by using a set of calibration samples and picking the threshold based on the vertical’s score distribution in the calibration set.
We’re currently working on iterations to build and improve our universal search system. At the same time, we’re also working on improving the quality of results from each vertical. As we start to show more and more types of content to Pinners, our eventual goal is to make this system into a platform where other teams can easily plug new types of content into our search results.
Acknowledgments: Boris Lin, Jason Lin, Jiawei Kuang, Junwei Shi, Long Cheng, Lulu Cheng, Rajat Raina, Randall Keller, Yanis Markin, Yixue Li