Under the hood: PinQueue, a generic content review system

Pinterest Engineering
Pinterest Engineering Blog
5 min readMar 11, 2016

Yuan Yao | Pinterest engineer, Infrastructure

People use Pinterest to discover, save and do things that inspire them. Every day, we connect millions of people to billions of Pins, and it’s our mission to present the highest quality content to Pinners. We use machine learning to filter out content that violates our policies for everything from Promoted Pins to spam, but manual moderation is still needed since algorithms can’t cover all of our sophisticated and ever-evolving content policies. Human inspection is always a valuable addition to machine learning pipelines. To facilitate the manual moderation process, we built PinQueue, a generic content review system.

A variety of teams at Pinterest use PinQueue, including:

To date, PinQueue has helped us process more than three million items in more than 600 queues.

Under the hood

PinQueue consists of an AngularJS web application as its front end, a RESTful web service implemented in Flask and is backed by a single MySQL database.

The service provides an HTTP interface and a set of RESTful APIs for the web application, as well as other external systems for communication. It uses SQLAlchemy ORM to interact with the database.

A bigger picture

PinQueue is designed to be a generic platform that supports the review of any type of content. Engineers and analysts can easily set up workflows on top of PinQueue with different upstream and downstream systems. Its service APIs and front end templates are highly pluggable and extensible.

At Pinterest, a typical content review pipeline built around PinQueue consists of Stingray as its data source and PinLater as its job execution system.

Stingray is a distributed stream processor and rule engine. We can define Stingray rules to respond to all types of events that happen on Pinterest (e.g. an advertiser wants to promote a Pin). Once these rules are triggered, they’ll extract information from events (i.e. Pin ID or promotion spec ID) and send them to the PinQueue database through the PinQueue API.

When our analysts come to PinQueue to review a Promoted Pin, for example, PinQueue will contact corresponding services at Pinterest to retrieve relevant information (i.e. information about the Pin, the board and the Pinner or advertiser). Analysts will then review them and make decisions through the PinQueue UI.

Once decisions are made, PinQueue will send the decisions (labels) to PinLater (an asynchronous job execution system) to be enforced (i.e. disapprove a Promoted Pin because it doesn’t meet our advertising standards).

A closer look

We have a variety of data models in PinQueue for categories like safety, spam, monetization and more. Each category represents a content type and contains a collection of queues.

  • Queue: Each queue is a collection of items that needs to be reviewed for the same reason.
  • Item: Each item represents an event we need to review (e.g. a Pin is reported as inappropriate content, or an advertiser wants to promote a Pin) and contains all the information about this event that needs to be reviewed and acted on.
  • Widget: An item is a collection of widgets. A widget is a piece of information that can be interacted with. Using the previous example of a Pin being reported as inappropriate content, we’ll create an item to represent this event. This item will contain a Pin widget (the reported Pin), a board widget (the board that this Pin is saved to), a user widget (the Pinner who saved the Pin) and a reporter widget (the Pinner who reported this Pin).
  • Each type of widget defines its own template (i.e. information it contains and how to display it) and actions it supports. As an example, a board widget includes a board name and description, the number of Pins saved to the board, a reputation score of the board and a sample of 10 Pins on the board. Its actions include “ignore” (no action), “hide” (hide the board from public feeds) and “deactivate” (deactivate the board).
  • In addition to a single action on a single widget, item-level action is also supported through a predefined combination of widget-level actions (e.g. “deactivate all” as an item-level action will apply “deactivate” action to every widget within this item).
  • Reviewer: We also have a reviewer entity for authorized users of PinQueue. We have a reviewer-category mapping where we control which categories each reviewer has access to. We also have a reviewer-item mapping, where each item can only be assigned to one reviewer so that multiple reviewers can work simultaneously in the same queue without conflicts or duplicate work.

Advanced features

PinQueue has a few advanced features:

  • Caching: We cache the data for static widgets instead of retrieving it through service calls at review time.
  • Filtering: We define filters to filter out items that don’t need review (e.g. a reported Pin has already been deleted or a campaign containing the Promoted Pin is no longer valid).
  • Quality control: To ensure the precision and consistency of the reviews, we can sample some items to be reviewed simultaneously by multiple reviewers. If conflicts occur in their decisions, these items will be sent to a separate queue for discussion or supervisor review.
  • Metrics and monitoring: We collect metrics to monitor the review process, such as content volume, response time or decision distributions.

We believe a generic content review system like PinQueue is useful to many companies, and we plan to open-source PinQueue in the near future. Stay tuned!

Acknowledgements: PinQueue was built by Eric Conner, Jonathan Horovitz, and Yuan Yao. This team, as well as people from across the company, helped make this project a reality with their technical insights and invaluable feedback.

--

--