Wenke Zhang, Sunny Chang, Qinglong Zeng, Andrey Gusev
Pinterest Engineering, Content Quality
At Pinterest, we’re building a visual discovery engine where ideas become actionable with links to more information. Once a Pin is saved, its linked content may be updated or expire over time, and so it’s critical to know the quality of the linked page behind the Pin to improve click-through experience. In this post, we’ll discuss how we built Spark and TensorFlow based ML pipelines to compute Pin and link page relatedness signals.
Pinterest hosts billions of Pins (images that are visual bookmarks) from Pinners. To keep the onsite and offsite content consistent, we reindex web documents incrementally daily and derive signals to measure relevance between Pin and linked page. We rely on embeddings of image and text content to measure visual and semantic similarity.
The system is decomposed into: text cohesion signal that compares a Pin’s salient keywords and text content from its linked web page, image cohesion signal that detects image similarity between onsite and offsite images, image-text cohesion signal that compares image and text with visual classifier; and finally, blending classifier that merges these signals into a single measure of proximity between Pin and landing page content.
Text relatedness is an important signal of page cohesiveness. Intuitively, we’d like to see a Pin’s web page match its semantics. For example, if we see a Pin under a board named “modern furniture” with the title “white sofa”, we might expect to see it linked to a retail site with home decor products. To measure this similarity, we extract both onsite and offsite text signals and compare them in the space of textual embeddings.
We then compare the onsite and offsite text signals in an embedding space, where text are represented as vectors and semantically-similar phrases would be close to each other. Mapping the text signals we extracted to the continuous vector space, we can infer text relatedness between a Pin’s information and its link by calculating cosine similarity of the two.
Under our current model, image relatedness signal consists of two sub-components: image visual similarity and image semantic similarity.
Image visual similarity is a raw signal we designed to answer the question whether we can find an image on a web page which looks similar to the image saved on Pinterest. We can tackle this problem by computing visual similarity between the Pin image and each image from its web page, and then taking the maximum of the scores. To achieve high precision and recall, we utilized a well-trained image near-duplicate detection model to predict the visual similarity score of image pairs. The model is a TensorFlow feed forward neural network which takes advantage of transfer learning over visual embeddings. More details can be found in this blog.
However, the failure of visual similarity detection does not necessarily mean images are not cohesive. For example, these two images in figure 3 look quite different from each other, but are both showing the same product (a white 3-tier metal cart), and should be considered as a cohesive pair. Therefore, we built another raw signal to capture image semantic similarity by transforming images to text annotations. This image-to-text model was developed by training an image to text annotation classifier. We integrate a vision model that classifies an image to top 10K search queries with most traffic in the past year, as shown in figure 4. The image features are visual embeddings trained using metric learning that optimizes various vision tasks at Pinterest. The search queries are first filtered to remove popular misspelled and trending queries, and are binary encoded for training. The most engaged Pin images are deduplicated and treated as positive samples for each class. The classifier model contains two fully-connected layers, and is trained to optimize sigmoid cross entropy loss. With the images transformed to predicted text annotations, we can easily judge image semantic similarity just like text relatedness, where we compare onsite and offsite text signals in a textual embedding space.
Image and Text Relatedness
With Pinterest visual search, image is typically the focal point when users explore content, while text is more informative and actionable when user wants to get in depth into specific content that are of interest to them. Therefore it is also important to understand how well image and text are aligned between a Pin and its link page.
We currently use two methods to compare image and text: image classification and optical character recognition (OCR). Firstly, we reuse the classifier discussed earlier to transform an image to text, but compare the output text annotations with link page text. Secondly, we see many Pin images contain text that highlights the linked web page, especially on those high quality native creator Pins that are designed for Pinterest. We leverage OCR techniques to extract text from image to obtain descriptive information of the Pin. Once we map image to text, we compare image and text similarity using textual embeddings as described in the previous section.
The final set of raw signals consists of:
- Text relatedness score to measure if the human-curated keywords of a Pin are relevant to a link’s text content
- Image relatedness score to measure image visual and semantic similarity between Pin and link page images to understand if the web page contains images of related themes and styles
- Image-text relatedness score that measures if a link’s text content is complementary to the Pin image topic
- Text-image relatedness score to measure if a link’s images are relevant to its salient keywords extracted from onsite text content
With all signals in place, we build a binary classifier on a human labeled gold dataset to decide the overall Pin cohesion.
We built the Pin cohesion signal to measure Pin and link page similarity. Thanks to Pin cohesion, since November 2018 we’ve seen metrics gains in search and engagement. This signal powers search, recommendations and home feed surfaces as a ranking signal to improve content quality on Pinterest, and drive off site traffic.
Pin cohesion is a collaborative project in Pinterest. Special thanks to our interns and the following members of the team: Peter John Daoud, Renju Liu, Nick DeChant, Yan Sun, Omkar Panhalkar, Grace Chin, Jun Liu, Yang Xiao, Heath Vinicombe, Vincent Bannister, Jacob Hanger, and Zhuoyuan Li for all their contributions on this project.