How Pinterest fights misinformation, hate speech, and self-harm content with machine learning
Vishwakarma Singh | Trust and Safety Machine Learning Lead & Dan Lee | Trust and Safety Data Scientist Lead
Pinterest is an inspiration network where people come for positivity and new ideas, and there’s nothing inspiring about harmful content. As a visual discovery engine with hundreds of billions of Pins, it’s essential we use the latest in machine learning to quickly eliminate harmful content and counter those with intent to distribute it.
To keep Pinterest safe and inspiring, we proactively take action on content that violates our community guidelines through moderator investigations and automated systems, in addition to fighting spam with a real-time rules engine. Our machine learning models identify content that violates our policies, from health misinformation to hate speech, self-harm, and graphic violence. Over the years we’ve also made advancements to the ability to detect similar images using Spark, LSH and TensorFlow, which has been applied to Trust and Safety work to take action at scale.
Using machine learning models to automatically detect unsafe content before it’s reported, policy-violating content reports per impression have declined by 52% since the fall of 2019, when the technology was introduced. And, since April 2019, reports for self-harm content have decreased by 80%.
Fighting areas like misinformation is complex and always evolving, so we’re constantly working to improve our technology and policy approaches to keep our platform safe and inspiring using a mix of humans and machines.
Enforcing Policy with Machine Learning Models
When enforcing policies across Pins, we group together all Pins with similar images and identify them by a unique hash called image-signature. Our machine learning models generate scores for each image-signature, which are stored in a key-value store based on rocksDB for online serving. We then apply the same enforcement decision to all Pins with the same image-signature based on these scores.
Since Pinners usually save thematically-related Pins together as a collection on boards around topics like recipes, we also have a machine learning model to produce scores for boards. We enforce board filtering using the scores of the model.
Today our models detect and filter policy violations for adult content, hateful activities, medical misinformation, drugs, self harm, and graphic violence. We plan to explicitly model other categories in our next model iteration.
We have complementary batch and online models to proactively detect policy-violating Pins. When a new Pin is created, our technology looks up the Pin’s image-signature to see if a batch model score is present. If it exists in the store, the corresponding scores are used for enforcement, otherwise we fall back on scores generated by the online model. A bird’s-eye view of the enforcement system is shown in figure 1.
Machine Learning Models
Here is an overview of our machine learning models for detecting harmful Pins and boards.
Pin Batch Model
Our Pin batch scoring model is a feed forward network, as shown in figure 2, that outputs a distribution over the six modeled violations categories and a safe category. The model consumes two features: PinSage embeddings and image text extracted via Optical Character Recognition (OCR). PinSage is a strongly informative representation of a Pin based on its keywords and image embedding which are aggregated with a Graph Convolution Network using the bipartite graph of Pins and boards. We use standard supervised learning techniques to train interactively on a single GPU instance in Jupyter.
Our training set includes millions of human-reviewed Pins, consisting of both user reports and proactive model-based sampling from our corpus. Our Trust and Safety operations team assigns categories and takes action on violating content, which form the labels for training and evaluating our models.
Inference is performed daily using Spark, Spark SQL, and PySpark Pandas UDFs to score our entire corpus of billions of Pins. The core of our inference job is shown in figure 3.
Pin Online Model
To optimize the time it takes for Pins to flow through the offline pipelines to our scoring workflow, we employ a Lambda Architecture with an online variant of our Pin batch model (with identical architecture) which consumes embeddings from an online variant of PinSage. Our online model produces categorical score distribution in real time for new Pins and thus enables us to immediately prevent distribution of policy-violating Pins. The online model trades performance for speed — online PinSage does not use the Pin-Board graph, so it is less precise compared to the batch version, but is available in near real-time. Inference triggered by events in Kafka is performed in a Flink job using TensorFlow Java library and the output scores are persisted to our internal Galaxy platform for storage and serving.
Board Batch Model
We use a Pin model trained using only PinSage embeddings to generate content safety scores for boards. We construct an embedding for each board by aggregating the PinSage embeddings of the most recent Pins saved to it. These board embeddings are also in the PinSage embedding space, so we feed them into the PinSage-based Pin model to get a categorical score distribution for each board. This allows us to identify policy-violating boards without training a model for boards.
We also fan-out the board scores to image-signatures and use them to filter policy-violating Pins. An image-signature score distribution is the category-wise average of the board scores containing Pins belonging to the image-signature as shown in figure 4.
Unsafe content represents a very small percentage of impressions at Pinterest, which can make prevalence difficult and expensive to measure precisely. One way we measure success is by looking at user reports with confirmed cases of violations to provide visibility into unsafe content spikes, typically due to new types of trending content that may pose a challenge to our models.
Pinterest is committed to making the internet a more positive and safe place. We heavily invest in machine learning to detect and enforce problematic content at scale, leveraging multiple models to protect our users from harmful experiences. We are constantly working to build technologies and enforce policies that create a positive environment that matches the intent behind why people come to Pinterest — to be inspired. This is the work running beneath those positive experiences, ensuring that harmful content is identified and removed and inspiring content rises to the top..
At a high level, our Trust and Safety engineering organization is made up of signals, tools and platform teams. The signal team, comprising engineers, data scientists and applied scientists develops and launches machine learning signals, the platform team builds infrastructure to ensure enforcement at scale, and the tools team builds tools to enable subject matter experts to accurately review and label content.
We’re always working to improve our content safety technologies and practices, as well as working with others in the industry. We recently hosted a Trust and Safety Machine Learning Summit with talks from industry colleagues from companies including Youtube/Google, Facebook, LinkedIn, Snap, Yelp, and Microsoft, with hundreds of people in attendance. If you’re interested in the latest news and events from Pinterest Engineering, follow us on Twitter at @PinterestEng. And visit our careers site if you’re interested in joining our team and working on solutions to these challenges in trust and safety engineering.
Huge thanks to Yuanfang Song, Minli Zang, Oladipo Ositelu, Kate Flaming and Dennis Horte for contributing to the post! Thanks to Harry Shamansky for help with publishing this blog post!