Building a real-time user action counting system for ads
Del Bao, Software engineer, Ads Infrastructure
Guodong Han and Jian Fang, Software engineers, Serving System
The Pinterest ads team’s mission is to provide the best experience for both Pinners and advertisers. Our ads system is a real-time bidding system that delivers targeted ads based on a variety of attributes. One of the major behavioral attributes is user action counts, the count for a Pinner’s past clicks, impressions and other actions on ads. An important part of user action counts is frequency control, which sets a maximum number of times an ad is shown to a Pinner (impression counts over a period of time enforce this limit).
Because our system is real-time, we have to serve user action counts immediately upon request. That’s why we built Aperture, an in-house data storage and online events tracking service that achieves the ads user action counting requirements. In this post, we’ll explain Aperture and our rationale for a central service.
User action counting problem
Frequency control manages the delivery of ads to Pinners. By controlling the frequency, an advertiser can either prevent overexposure of the same ads or lift brand awareness (i.e. a brand’s increase in customer or audience perception). This use case requires several user impression counts for promoted Pins (ads) at various time ranges in real-time.
The frequency of an ad is defined as the number of times it’s viewed by a Pinner in a given timeframe. The action is an impression, and the time window can be one, seven or 30 days.
Fatigue happens when a Pinner becomes tired of viewing the same ad. This usually has negative impact on CTR (click-through rate). Our ads system has a training pipeline for the fatigue model for pCTR (predictive CTR). At the request time, several user counting features are fetched for the user fatigue model. A Pinner’s fatigue is measured at different campaign levels. Our campaign structure has four levels:
- Advertisers: the account
- Campaigns: houses of ad groups
- Ad groups: container of Promoted Pins
- Promoted Pins: ads
The required actions are impressions and clicks. First, our ads serving system fetches a Pinner’s past engagement counts over a time range for a list of ads candidates at four campaign levels. Then, the counting features are fed to fatigue models for CTR prediction.
We generalize the query pattern from these use cases as counts queried on a couple of dimensions. Each dimension has finite and discrete values. Each count is on every possible combination of dimension values. A query example is:
What is the Pinner’s impression count of promoted Pins 1,2,3 for the last week?
In this example, action (impression), time range (last seven days), promoted Pins (1,2,3) are dimensions in the query. A dimension is required if a count query request must provide one or more values for it.
The current dimensions we track for user action counts include:
- Action: click and impression are the two most widely used actions. The system is also extensible to any actions, like a saving a Pin.
- View type: view type is a term for the type of Pinner’s landing page. The majority of them are home feed, search and related Pins. An ads request is issued for a Pinner while the page is loading. Most of our use cases care either the action counts of the requested view type or across all view types. In the latter case, no view type is specified.
- Time range: counts are supported on arbitrary days, hours or minutes.
- Entity dimensions: a user action event on promoted Pins happens at all four levels. For example, when a Pinner clicks on an ad, the counts for the promoted Pin, its ad group, campaign and advertiser are all incremented respectively. Each entity level is an individual dimension.
User action counts request flow
When a Pinner loads a page of a view type on Pinterest, an ad request with user info is sent from the web server to the ads system in order to fill the ads spots (i.e. an opportunity to display an ad). The system will retrieve ads candidates from our inventory and subsequently determine which ads among the candidates to put in the designated spots. This behavior is called ad insertion.
As a Pinner interacts with promoted Pins, these actions are tracked by calling a tracking endpoint from the front-end. The server then logs the event to Kafka. A Kafka consumer will consume the messages and write the action events to a data store (Aperture). Ads backend servers request user action counts from Aperture after candidate retrieval.
User action counts are computed by tallying the number of events per-user. We consider two approaches to bridge the gap between the counts and the raw events.
Approach 1: client-side events counting
- Ads servers reach out to the data store to fetch raw events.
- The business components run ad-hoc logic to compute counts.
Approach 2: independent counting service
- The event data store has a counting layer to serve counts.
- The service exposes a set of generic counting APIs.
The table below compares the pros and cons of two approaches in detail.
Since the counting service solution surpasses client-side counting for most criteria, we went with an independent counting service approach.
Counting is based on the deduplicated events data for two reasons:
- The ads logging system has no guarantee of delivering the event message exactly once.
- The business logic requires the same action of a Pinner on a promoted Pin within a time range be counted only once. For example, an impression of a Pinner for the same insertion is only counted once within a day.
This requires the counting service to track raw events in order to dedup them. This also requires the counting computation to happen at the serving time, not at event ingestion time.
Moreover, there are various counting cases for different dimensions and time ranges. Pre-computing counts at ingestion time for all cases isn’t extensible. The service needs to have a flexible query interface for various counting cases.
The SLA for the counting service should be less than 8ms’s P99 latency for scanning and filtering users’ raw events with 200k qps at peak time.
Aperture as the counting service
We leverage Aperture, a service we built in-house. Compared with other data store services, Aperture is designed for time series data storage and online events tracking and serving. It supports low-latency events appending, deduplication, filtering and aggregation. Under the hood, the service uses RocksDB as the storage engine, and is powered by Helix-enabled Rocksplicator for cluster management, recovery and replication. We aim to open-source Aperture in later this year, and how it compares to other systems will be revealed then (our goal for this post is to explain its technology for solving the ads counting problem).
Aperture implements data storage schema and serving functions for time series event data at the user level. It abstracts a single event as a byte array with a predefined length. The event can be an ID or list of fields. Logically, a record is a key-value pair of a user and all events of the user in the past. As RocksDB retrieves and stores at the record level, storing all events of a user as a single record incurs high cost for the time range queries. Ideally, only events of the queried time range are supposed to be fetched from the disk.
In order to optimize query performance, Aperture bucketizes user events into separate records.
Key -> [event_id, …, event_id],
Key -> [event_id, …, event_id]
Key = [User Id] + [Time Bucket Type] + [Time Bucket Id]
Time Bucket Types: D(Day), H(Hour), M(Minute)
Specifically, all events falling into the same time range (represented by the time bucket ID) will go to one record. Thus, the RocksDB key is
[User Id] + [Time Bucket Type] + [Time Bucket Id]. When Aperture receives an event, it computes bucket IDs under the required types (day, hour and/or minute), and then writes back these three records, which is appended to the DB with the customized RocksDB’s merge operator.
Aperture’s aggregation API has SQL-like semantics. Here’s a table for the comparison
Note: because time range isn’t a dimension in the event, the query language doesn’t support multiple time ranges. A single aggregation request can only support one time range.
Aperture represents an event as a byte array, and is oblivious to the dimensions of events. It’s the client’s responsibility to define the event schema. For ads use cases, an event is a list of dimensions laid out one after another. Aperture’s aggregation API allows bitmask in the query to denote the corresponding bytes as the dimensions the operators apply on.
Every time a promoted Pin is inserted to a Pinboard, the ads system generates a unique insertion ID to identify it. An event is thus identified by the insertion ID, action and viewtype. Insertion ID is the additional dimension. Aperture dedups the event based on the full event bytes.
On the client-side, we ensure counts for all use cases are fetched in a single place. This way, we can manage the number of requests to Aperture.
We issue the requests to Aperture after the candidate retrieval, because:
- Count consumers reside after the candidate retrieval phase.
- We get a list of promoted Pin candidates after retrieval. We request counts only for these Pin candidates instead of all Pins the Pinner has engaged with. This reduces memory footprint and the bandwidth to the downstream servers.
We fill up a nested map for time range, entity type, entity ID and counts, and pass it to the downstream.
In order to deliver relevant, useful ads that match a Pinner’s interests, it’s important to accurately count the actions a user takes on promoted Pins. Our ads system leverages Aperture’s time series storage power and aggregation APIs for user action counting. We’re looking forward to onboarding more use cases, and sharing this technology with the open-source community in the future.
Acknowledgements: Huge thanks to Liquan Pei, Zack Drach, Caijie Zhang, Scott Zou, Hongjie Bai, Jinru He, Yining Wang and Bo Liu for the design and implementation advise.