Dana Yakoobinsky [Web engineer, curation product] and Dafang He | [Machine learning engineer, home feed]
While planning one’s life might look different these days, our insights tell us people are saving ideas for the future more than ever. Last month, for instance, Pinterest had a 60% increase in the number of boards created compared to last year, and engagement with boards is up nearly 75% year-over-year. To help Pinners better organize and plan their lives, today we’re launching new board organization tools including suggested board sections and automatic grouping of Pins into those sections. These recommendations can help any Pinner get started planning for a project and take the guesswork out of how to organize ideas into sections.
Consider our new and improved boards a personal assistant of sorts. We use our corpus of data (billions of ideas saved and labeled by millions of Pinners) and machine learning to determine how Pins should be grouped within relevant themes. For example, maybe a Pinner is new to cooking but has been saving hundreds of recipe Pins. With this new tool, Pinterest may suggest board sections like “veggie meals” and “appetizers” to help the Pinner organize their board into a more actionable meal plan.
This is a challenging user problem to solve since we must accurately cluster together similar Pins and predict how a Pinner might want to organize their board (by color, theme, category?). Leveraging machine learning and our own PinSage technology, we built a Pin-clustering solution to determine where and how Pins should be organized, allowing us to group together similar Pins and suggest new board sections with appropriate names.
The goal is to make planning on Pinterest near-automatic for those who need help moving along in a project, making it more likely Pinners will come back to their boards and take action on what they’ve been saving.
When a Pinner navigates to a board with eligible section suggestions, we show them a banner with the suggested section name and a preview of Pins for the new section. An eligible suggestion is any available group of Pins our clustering algorithm has deemed worthy of qualifying as a section.
If they choose to create the suggested section, we take them to a modal with all the Pins our algorithm has selected as appropriate for the new section. They can go ahead and either save all the Pins or deselect any they don’t want added to the section.
Pinners can then name their section or use the suggested name.
When the Pinner lands back on their board, their brand new section has already been created, and their board is now more organized.
The real magic, of course, is how we find these suggestions.
The biggest challenge in this work is grouping similar Pins to create useful new board sections. In order to achieve this task, two sub-tasks are involved: (1) featurizing Pins to better capture their information, and (2) selecting the clustering algorithm to use for grouping Pins within each board.
Featurizing Pins with PinSage embeddings
In order to featurize a Pin, several important aspects need to be considered:
- The text information associated with the Pin
- The visual features extracted from the image
- The graph structure
Our PinSage embedding has been widely used in Pinterest for user modeling, retrieval and recommendations. It maps each Pin into a dense vector which captures the aforementioned three aspects, making it a powerful feature for our clustering application. 
Selecting Ward clustering to group Pins
Ward clustering  is a widely used agglomerative clustering algorithm. It optimizes the minimum variance objective when choosing merging clusters . Specifically, at each grouping stage, let’s denote Ci, Cj, Ck as three clusters. The distance between the grouped Cluster Ci, Cj with Ck can be computed using the formula below. d(Ci, CJ) represents the distance between the two clusters, which can be precomputed. Thus we can optimize the final goal recursively and gradually group Pins together to form clusters.
— From wikipedia 
We adopt it here for two major reasons. First, it shows better offline performance compared with several other clustering algorithms (e.g., k-means). Using Ward clustering, we evaluate Pins based on how well they can be aligned with real sections created by other users. Second, while many clustering algorithms require a predefined number of clusters, Ward clustering does not.. Since we don’t know how many different events or ideas are associated with each board, an algorithm which doesn’t need the number of clusters is preferable.
Generating name suggestions
After we have cluster recommendations for a particular board, we need to find a human-readable label appropriate for use as a suggested section name in the upsell we show to the Pinner. To do so, we analyze the Pins in the cluster in a few steps.
A Pin is associated with a set of annotations. A single annotation is a real word or short phrase that captures the subject of the Pin. Annotations get computed by looking at different signals associated with the Pin, such as its title or the name of the board it belongs to. Machine learning algorithms evaluate those signals to generate the annotations. Additionally, the algorithms assign each annotation with a score, which is like a degree of confidence regarding the accuracy of the annotation.
To generate a name, we start by fetching all the annotations associated with all Pins in a given cluster. We then filter out all annotations with a score below a particular threshold to ensure we get annotations of high enough quality. Next, we rank all annotations in a cluster by both their score and the frequency the annotation is repeated in the cluster.
At this point we have a ranked list of annotations that we’re fairly confident accurately describes the cluster of Pins. There’s one issue left to address: How do we know these annotations are specific to this particular cluster and don’t describe all or most of the Pins in the board? To ensure the suggested name has enough specificity, we take the final step of filtering out annotations common across clusters from the same board.
Finally, we return the highest ranked annotation for the cluster as our suggestion for the new section’s name.
The current version of section suggestions can be seen as an aggregation of all of a Pinner’s actions on the Pin-board graph to create clusters. However, some Pinners might prefer more detailed sections while others prefer to cover a relatively broader topic for each section. One future direction is to automatically learn a clustering hyperparameter based on each Pinner’s created sections as well as similar Pinners’ created sections, which will enable more personalized section suggestions.
Currently, we suggest sections only on boards that do not already have sections, which makes it simpler to suggest a new section without the risk of colliding with an existing section on the board. However, in the future, we’d like to suggest Pinners add Pins to existing sections to make organizing boards more streamlined, even for users who are already aware of sections.
Acknowledgements: We’d like to give special thanks to Jessica Chen, Jacqueline Leung, Q Pinyokool, Steven Garcia, Thomas Thachil, Zhaohui Wu, Andrew Liu, Yitong Zhou, Aditya Pal, Pong Eksombatchai, Avantika Gomes, Tao Cheng, Chun-Wei Lee and Duo Zhang.
 Hamilton, Will, Zhitao Ying, and Jure Leskovec. “Inductive representation learning on large graphs.” Advances in neural information processing systems. 2017.
 Ward Jr, Joe H. “Hierarchical grouping to optimize an objective function.” Journal of the American statistical association 58.301 (1963): 236–244.