Jenny Liu | Software Engineer, Discovery
If you’ve ever heard someone mention “the Pinterest Rabbit Hole,” they’re likely referring to Related Pins, the ideas you see below a Pin you’ve tapped. Related Pins is a feed of content relevant to the closeup Pin and accounts for 40 percent of engagement on Pinterest. With such high usage, it’s critical these recommendations are as personal, relevant and useful as possible. We dug a bit further and found an opportunity to make Related Pins as fresh as possible, and ensure newer Pins — seven days old or less — were being shown more prominently when relevant.
The cold start problem
In the recommendations space, finding high quality, engaging, fresh content is known as the elusive “cold start” problem. When content is new, it often has little to no signal for the recommendation engine to work with, so the usual methods of candidate generation and ranking may not be as effective.
With Related Pins, the cold start problem is equally daunting. Most are generated from traversing Pixie, our graphed-based system for making personalized recommendations, followed by a ranking stage. In the Pixie stage, we start from the node of the closeup Pin and perform random walks across the graph. The Pin nodes visited during this process are recorded and sorted by visit count. Given that older content has had time to be discovered in the system and saved to more boards, older Pins are more well-connected in the graph. As a result, there’s a higher chance of traversing stale (older) Pins on the random walk.
In the ranking stage, we use engagement data as positive signals to train our model. This creates a feedback loop where the older, yet evergreen Pins (that comprise most of the feed) continue to gather the most positive signal, while newer Pins gather less engagement data. During training, our model wasn’t able to determine what good quality, fresh Pins looked like. Over the past year, we’ve worked on addressing each of these issues.
Generating relevant fresh Pins
The first challenge we tackled was generating fresh Pins that were relevant to the closeup Pin (also known as the query Pin). We did this by using Pixie to fetch related boards instead of Pins. On the random walk, we kept track of which boards were visited and sorted them by visit count.
Then, we gathered the new Pins on each of those boards and used them as a new candidate set.
While this approach gave us some fresh Pins to work with, we wanted to improve coverage further.
The next method we tried was augmenting the existing Pixie graph with new nearest neighbors. We built a daily job that first fetches all Pins added to the system over the last seven days. Then, we use visual and text search to find and map relevant recommendations for each new Pin. This mapping is then inverted and uploaded to our Pixie server.
For each Pixie result Pin, we look at the mapping for a new neighbor. We keep a list of these fresh neighbors and sort them by how many times they surface. This final list is trimmed and returned as a second set of fresh candidates.
Blending fresh Pins into the ecosystem
Now that we have fresh candidate sets, we have to combine them with the broader results. Our model wasn’t trained on the newest Pins (which don’t have necessary metadata for ranking), so we started off with naive ratio blending. We inserted a new Pin every N slots in order to guarantee some freshness in our results. This evenly distributes fresh Pins in our result set so Pinners are more likely to see and engage with them. This was an easy initial solution to help gather more signal on new content.
The next step was to enable our model to rank fresh Pins. We did this by training a new model with data gathered from users who saw our fresh candidates. We also created a simplified form of metadata for fresh Pins that we could generate within seconds (rather than days).
With these new changes, we’re able to migrate from naive ratio blending to floor blending. With floor blending, we first ran the fresh candidate Pins through our ranker, and then merged them with the rest of the candidates by score. To guarantee at least some recency, we set a freshness threshold. In cases where fresh candidates are ranked low in a result set, we’d move them up to satisfy the threshold (if fresh candidates rank well in results, we may not need to make changes).
By generating relevant, fresh candidates, enabling our model to rank and intelligently blending them in result sets, we increased freshness in Related Pins by 1,400 percent, while keeping other engagement metrics neutral. This initiative will help our recommendations improve with time to show Pinners the most relevant and engaging content in areas they love most, like fashion, cooking, beauty and more.
Acknowledgements: This technology was built in collaboration with Stephanie Rogers, David Liu, Pong Eksombatchai, Connell Donaghy, Chao Ren, and Crystal Lee.