Del Bao, Vikas Kumar: Software Engineer, Ads ML Infra
Zack Drach: Engineer Manager, Ads ML Infra
More than 320 million unique visitors come to Pinterest every month to discover inspiring content, which also gives advertisers a unique opportunity to get their products in front of people with commercial intent.
Curation of this personalized catalog requires accurate and real-time mapping of users’ short-term interests. Mapping long term interests is equally important for showing relevant content, as short-term interests can be noisy.
Some typical use scenarios for user features:
- Retrieve ads using user embedding signals based on recent and past engaged Pin embedding (we map each Pin into an N-dimension vector space, called PinSage embedding) with various windows of sessions.
- Boost ads relevance with user interests derived from search queries. The signal is produced by collecting a few significant interest categories of a user from search queries by FastText algorithm over a short/long period of time and apply them for ranking on other surfaces, e.g., home feed.
To help machine learning engineers capturing user propensity, we’ve developed a real-time user signal service that acts as a platform for consuming user engagement data, as well as computing and serving user features. With this new platform, developers spend a minimum amount of effort to build, test and experiment new user signals with ML algorithms and provide personalized on-site and in-app user signals to our serving systems.
Since the launch, the platform garnered popularity across the company and enabled several use cases for creators engagement, ads retrieval, shoppable in retrieval, and search ranking. The system reliably produces and serves user signals in real-time at Pinterest scale with minimum infra cost.
Here we’ll share more on how we designed the system to achieve these goals.
Behind the scenes of what we built
Pillars of a user signal platform
The five pillars of a user signal platform are:
- Timeliness. Realtimeness of an event’s feedback loop is crucial to show fresh content. Take Black Friday for example — imagine the faulty experience of a Pinner buying a pair of jeans, then receiving discount ads to buy the same pair.
- Flexible User Context. User context describes any relevant information that can be used to characterize the situation of a user. Pinterest content is tailored to the user context. The tailoring process can be informed by two types of information: short-term insights, which show intent, and long-term insights, which include demographics, behavior, and preferences over time. This demands the platform with developer APIs has rich session semantics.
- Scalability. Pinterest gets 1.2 million events per second from users. Personalization at scale is a challenge, and so our system needs to treat every Pinneras an individual, and not as part of an audience. This means the computation resources are allocated on a user level, not on a targeted user group.
- Developer Velocity. Pinterest is a data-driven company and we rely on experiments and A/B testing to make key decisions. Machine learning practitioners rely on the right abstraction to implement and validate their algorithms on the platform.
- Simple to Build. The system has many data fetching paths in order to produce the final signal. This usually involves async code and intricate futures/promises handling. We want to choose the right framework to build the core infra logic to make code simple and readable.
There are several design considerations to make the service performant and speed up developer velocity.
A Materializer is responsible for joining external data against user events. We steer towards designing a generic materializer container to ease developers’ coding efforts. Another goal is to reduce data fetching costs so that we can achieve minimum event processing delays. This is a must to achieve timeliness of event processing at the Pinterest scale.
We apply the separation of concerns principle: separate request specs and fetching execution into two layers. Feature request is expressed as the data source, request key, and feature key. Data plumbing is powered by an execution engine. Adding a new feature boils down to simply specify the request spec via materializer APIs, which requires only a few lines of code.
Aggregation is the computation of stats from a session of user events. Having 50k QPS per signal, the system cannot scale with fully recomputing all events of a user, especial in the long-term user context, e.g., 90 days. The aggregator thereby adopts the incremental computing model with an intermediate state in a remote state store.
The model has two benefits compared with re-computing every event at the request time:
- Event Store has a resource limit for the stored events (usually seven days). With incremental computation, a user’s historical events were persisted in the aggregation state. The platform is able to serve signals that can capture long-term user propensity.
- Latency is drastically improved as previous computation was persisted in the state.
The figure below illustrates the computation paradigm.
View is the client-facing layer that is responsible for lightweight transform of state. We separate view and aggregator to tailoring to two requirements: flexible user context and scalability.
The Separation of Concerns principle knocks on our door again: state represents the heavy lifting of the computation. View transforms the states into a compact feature with customized session definitions suitable for the client, a.k.a., ML & serving system. Moreover, by sharing the state among multiple views, we can further save repeated computation and scale up the system.
Dagger and API abstraction
Development of the core infrastructure needs to be simple as well: The core logic of the framework we built relies heavily on data dependencies: external data sources for materialization, sequences of events and remote aggregator state, etc. The canonical programming model for this type of framework is based upon the design principle of dependency_inversion_principle. Dependency Injection is a concept of the passing of dependency (a service or data) into the object that would use it. We adopt Dagger2, a lightweight async DI framework, open-sourced by Google and Square to fulfill this need.
On the other hand, the learning curve for Dagger2 is relatively high, so as a developer platform, we make our developer API dagger-agnostic. Specifically,
Aggregator and View API
Fit together: Architecture
The journey of events to a user signal
Here we present how user events flow through the signal production pipeline continuously and are piped to Pinterest ML & serving systems.
This starts with user engagement with the Pinterest website, such as clicking on a pin or creating search queries, which then bounces back to the backend and is tracked with users’ permission. The raw engagement event is lightweight to make tracking feasible for the infrastructure. The next step would be enriching them with rich content information (e.g., the Pin has various categories to denote its topic of interest). With enriched events, the pipeline can further condense the time sequence of events into a compact representation of a user’s instant, short-term and/or long-term propensity. This transformation is achieved via various aggregation and cutting-edge ML algorithms.
Here is a diagram showing how all the component fits together. We take a two-component architecture: async (offline) event-driven processing and online processing.
Asynchronous Event-driven Processing
The first step in the signal processing journey is to read the raw events and hydrate them with external content-based features. This consists of three components:
- Kafka: log messages consumed from Kafka contains trimmed-down information, such as userid, pinid, and action type.
- Materialization: is a process that takes a user event and enriches it with content features (pin features, query features, etc) from external data sources.
- Materialized event store: the materialized user events are written to a time sequence data store. It acts as an intermediary between the offline processing and the online query.
Aggregation happens online at the request time
- Aggregation: Aggregation can optionally take a pre-computed state and an incremental set of user events from the event store to compute user features. This is the phase we design a set of APIs and create a layer to decouple the ML algorithm space from the compute engine.
Server Performance for Scalability
Developer Velocity and Simple to Build
We introduced the core infrastructure in a month thanks to the adoption of the Dagger2 framework. Developing five signals took about two weeks by three engineers in order to turbo-charge some of our most critical ads retrieval and ranking initiatives.
Next we’ll develop a new data processing architecture to move aggregation to event time, and unify user signals across Pinterest to this real-time infrastructure.
Acknowledgments: Huge thanks to Siyang Xie for consulting on the design, Ning Zhang, Shu Zhang, Yiran Zhao, Tianbing Xu, Nishant Roy and the entire Ads ML Infra team, and George Yiu, Se Won Jang, Qingxian Lai, Connell Donaghy, Guodong Han, Jessica Chan, Saurabh Joshi, Sihan Wang who helped in improving and scaling the infrastructure.
We would also like to give special thanks to Arun Prasad, Yitong Zhou and Bo Zhao from Content Personalization Team and John Zheng, Pihui Wei, Supeng Ge, Peifeng Yin, Crystal Lee, Xiaofang Chen from Ads Retrieval Team in our collaboration to develop and onboard an initial batch of user signals to the platform.
We’re building the world’s first visual discovery engine. More than 320 million people around the world use Pinterest to dream about, plan and prepare for things they want to do in life. Come join us!