PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest

Pinterest Engineering
Pinterest Engineering Blog
6 min readAug 10, 2020


Aditya Pal | Applied Science, Chantat Eksombatchai | Applied Science, Yitong Zhou | User Understanding, Bo Zhao | User Understanding, Charles Rosenberg | Applied Science, Jure Leskovec | Applied Science

As we build a visual discovery engine that powers 2B+ Pins, it’s crucial to understand user interests and preferences in order to serve relevant content. One standard approach to encode user preferences is via an embedding-based representation in a high dimensional space. Most prior methods tried at Pinterest infer a single high-dimensional embedding for each user in compatibility with the content embedding. This is a good starting point but falls short in delivering a full understanding of the user.

In this work, we postulate that a single embedding is not sufficient for encoding multiple facets of a user’s interests that might have no obvious linkage between them. They can evolve, with some interests persisting long term while others span a short time period. Recommended items are also represented in the same embedding space. A good embedding must encode a user’s multiple tastes, interests, styles, etc., whereas a recommended item (a video, an image, a news article, a house listing, a pin, etc.) typically only has a single focus. Hence it becomes important to represent a user with multiple embeddings, with each embedding capturing a specific aspect of their interest.

PinnerSage Model

In order to better understand our users’ preferences, we developed PinnerSage, a highly scalable, flexible and extensible recommender system that internally represents each user with multiple embeddings. Figure 1 provides an end-to-end overview of the PinnerSage recommendation model. The starting point for our model is to organize the repins and clicks of a user into multiple interest clusters by running the Ward clustering model and then generating a summary of each of those clusters using a medoid, an embedding, and a cluster importance score. Next, a subset of these clusters are picked by the online cluster selection, and it employs a nearest-neighbor index to generate recommendations to the user. Users’ actions are processed in real-time to update the interest clusters. In order for PinnerSage to provide relevant recommendations to our 400M+ monthly active users and adapt in real-time, we made several model design choices that we describe next.

Figure 1: Overview of PinnerSage model

Design Choice 1: Pin Embeddings are Fixed

The interest clusters in Figure 1 are generated by clustering the embeddings of repins and clicks of a user. The embeddings of repins and clicks are trained via the PinSage model that optimizes for contextual and visual similarity between Pins via a Graph convolutional model. Since our goal is to project users in the same space as the Pin embedding space, we consider the Pin embeddings to be fixed. This design choice simplifies our models considerably and allows us to run inference pipelines in parallel for each user.

Joint embedding inference models, where both user and Pin embeddings are inferred together, can be too complex and hard to scale. Moreover, we posit that in practice they compromise recommendation relevance, as some spurious connections between pins can be established via the users. To see this point, consider the example in Figure 2.

Figure 2: Three interests of a given user.

In the above example figure, a user is interested in painting, shoes, and sci-fi. Jointly learned users and Pin embeddings would bring pin embeddings on these disparate topics closer, which can compromise the relevance of the nearest neighbor-based recommender. Pin embeddings should only operate on the underlying principle of bringing similar pins closer while keeping the rest of the pins as far as possible. For this reason, we use PinSage, which precisely achieves this objective without any dilution.

Design Choice 2: Unlimited User Embeddings

Prior work either fixes the number of embeddings to a small number or puts an upper bound on them. At best, such restrictions hinder developing a full understanding of the users and, at worst, merge different concepts together, leading to bad recommendations. For example, merging embeddings could yield an embedding that lies in a very different region. Figure 2 shows that a merger of three disparate pin embeddings results in an embedding that is best represented by the concept energy boosting breakfast. Needless to say, recommendations based on such a merger can be problematic.

PinnerSage generates as many interest clusters as the underlying data supports. This is achieved by clustering users’ actions into conceptually coherent clusters via a hierarchical agglomerative clustering algorithm (Ward). A light user might get represented by 3–5 clusters, whereas a heavy user might get represented by 75–100 clusters.

Design Choice 3: Medoid-based Cluster Representation

Typically, clusters are represented by centroid, which requires storing an embedding. Additionally, centroid can be sensitive to outliers in the cluster. To compactly represent a cluster, we pick a cluster member pin, called medoid. Medoid, by definition, is a member of the user’s originally interacted pin set. Hence it avoids the pit-fall of topic drift and is robust to outliers. From a systems perspective, medoid is a concise way of representing a cluster, as it only requires storage of medoid’s pin id, and leads to cross-user and even cross-application cache sharing. It also allows our system to be compatible with other non-embedding-based recommendation systems such as Pixie.

Design Choice 4: Medoid Sampling for Candidate Retrieval

PinnerSage provides a rich representation of a user via cluster medoids. However, in practice we cannot use all the medoids simultaneously for candidate retrieval due to cost concerns. Additionally, the user would be bombarded with too many different items. To address these concerns, we sample 3 medoids proportional to their importance scores and recommend their nearest neighboring pins. The importance scores of medoids are updated daily, and they can adapt with the user’s changing tastes.

Design Choice 5: Two-Pronged Approach for Handling Real-Time Updates

It is important for a recommender system to adapt to the current needs of its users. At the same time, an accurate representation of users requires looking at their past 60–90 days of activities. Sheer size of the data and the speed at which it grows makes it hard to consider both aspects together. We address this issue by combining two methods: (a) a daily batch inference job that infers multiple medoids per user based on their long-term interaction history, and (b) an online version of the same model that infers medoids based on the users’ interactions on the current day. As new activity comes in, only the online version is updated. At the end of the day, the batch version consumes the current day’s activities and resolves any inconsistencies. This approach ensures that our system adapts quickly to the users’ current needs and at the same time does not compromise their long-term interests.

A/B Tests

PinnerSage is currently deployed in production and used by many products within Pinterest, ranging from Homefeed, Related Pins, Ads, Shopping, and Creators, in both their retrieval and ranking ML models. Our wins on the initial A/B test on two surfaces are highlighted in Table 1.

Table 1 shows that PinnerSage provides significant engagement gains on increasing overall engagement volume (repins and clicks) as well as increasing engagement propensity (repins and clicks per user). Any gain can be directly attributed to increased quality and diversity of PinnerSage recommendations.

Table 1: A/B test of PinnerSage vs current production, which includes a single embedding model.


We proposed an end-to-end system, called PinnerSage, that powers personalized recommendation at Pinterest. In contrast to prior production systems that are based on a single embedding-based user representation, PinnerSage proposes a multi-embedding-based user representation scheme. Our proposed clustering scheme ensures that we get full insight into the needs of a user and understand them better. To make this happen, we adopt several design choices that allow our system to run efficiently and effectively. Our large A/B tests show that PinnerSage provides significant gains in user engagement. Much of the improvements delivered by our model can be attributed to its better understanding of user interests and its quick response to their needs.


PinnerSage paper is to appear in KDD 2020. Read more details about the paper here:


We would like to extend our appreciation to Homefeed and Shopping teams for helping in setting up online A/B experiments. Our special thanks to the embedding infrastructure team for powering embedding nearest neighbor search.