The machine learning behind delivering relevant ads

Pinterest Engineering
Jul 20 · 9 min read

Felix Fang | Software Engineer, Advertiser Solutions Group

Chi Xu | Software Engineer, Advertiser Solutions Group

Pinterest is where people go to plan and shop, making ideas and ads from brands helpful in taking Pinners from inspiration to action. It’s our goal to ensure ads continue to be additive and not intrusive on Pinterest. Because of the unique and powerful first party signals on the platform, advertisers can reach Pinners based on their interests, intent and engagement on the platform.

To help in delivering the right ads to the right Pinners in an audience of hundreds of millions of people, we offer advertisers features to achieve relevance including Actalike (AAL) audiences, also known in the industry as Lookalike audiences. AAL audiences help advertisers reach potentially new users via audience expansion.

In this blog, we’ll focus on the machine learning model component of relevant ads delivery and explain how we achieve high quality audience expansion through universal user embedding representations together with per-advertiser classifier models. We demonstrate the power of the proposed combined approach by showing better performance over both regression-based and similarity-based approaches.

Our Proposed Approach: User Embeddings + MLP Classifiers

AAL methods mainly fall into two categories: regression-based and similarity-based approaches [1, 2, 3, 4, 5, 6]. Regression-based approaches treat the task as a binary classification problem, train a model for each seed list offline, and use such models to score candidate users directly, as shown in Figure 2a. Similarity-based approaches learn seed list representations from a set of user embeddings and expand audiences based on nearest neighbors in terms of either Jaccard similarity, cosine similarity, or dot product, as shown in Figure 2b.

Figure 2a. Regression-based approach. This is a discriminative model, and a linear model is often used since the features are sparse.
Figure 2b. Similarity-based approach. This is a generative model, and approximate nearest neighbor searching is used for speed.
Figure 2c. Combined NN approach. This is a discriminative model, and using dense user embeddings allows using a neural network model.

We found that each of the regression-based or similarity-based approaches has their unique strengths: regression-based approaches perform supervised learning on each seed list and excel at encoding seed list structure (lower bias, higher variance), and similarity-based approaches perform user representation learning which solves data sparsity (lower variance, higher bias). To combine the strengths of both worlds, we take pre-trained user embeddings and explicitly learn seed list classifiers. Because the pre-trained user embeddings already encode high-level information and are dense, it allows the downstream classifier models to converge faster and allows adoption of neural networks that otherwise would not work well with sparse inputs. A similar idea can be found in [5]. Here is a summary to compare different audience expansion approaches:

Table 1. Comparisons of regression-based, similarity-based, and combined NN approaches.

We directly feed normalized user embeddings trained using organic <user, pin> interactions to our per-advertiser models. More details about how universal user embeddings are trained can be found in [6]. For each advertiser, we reference the seed list as positive examples and use sampled monthly active users (MAUs) from the targeting country as negative examples, and build a binary classifier using a multi-layer perceptron (MLP) neural network, as shown in Figure 3:

Figure 3. Per-advertiser multi-head model training architecture. Models for advertisers share the same architecture.

Next, we want to investigate how sample weighting could improve our model. Our rationale is that user engagement could provide more information on seed user membership. Different from multi-task learning, here we design the model to focus on the similarity prediction task only. To do this, we use different sample weights for each positive example. Sample weights can be decided based on user engagement metrics such as number of impressions, click-through rate (CTR), etc. In this approach, we can treat each seed user differently and favor those with higher historical metrics. To ensure the sample weights are normally distributed, we apply log transformation on those that are greater than 1; for those metrics that are less 1 (e.g. CTR), we upscale the metrics by 10⁶, then apply log transformation. Then we use min-max scaling to normalize all sample weights to be within [0, 1] so that all sample weights are on the same scale. In our online alpha experiments, we found that this sample weighting improved revenue, number of impressions, and eCPM (effective cost per mille) by 0.82%, 0.38%, and 0.44%, respectively, in the audience act-alike targeting slice.

To train the model, we optimize for the weighted binary cross-entropy loss function defined as the following:

where N denotes the number of samples and M denotes different engagement metrics. Based on these engagement metrics, we compute sample weights.

To evaluate the quality of audience expansion, we assume that those in a seed list are similar to each other. So we split a seed list into 90% and 10% of its users, train a model with 90% of the users, use the model to score all MAUs except the 90% training set, and examine how the 10% holdout users are ranked in the expansion.

Audience expansion is essentially a ranking problem: find the most similar k users from all eligible candidate users. Basically, we rank all those eligible based on their similarity scores against a seed and choose top k. We use recall@k and precision@k to measure how good an expansion list is. Those are defined as:

recall@k = |E@k H||H| precision@k = |E@k H||E@k H| + |E@k R|
recall@k = |E@k H||H| precision@k = |E@k H||E@k H| + |E@k R|

where E@k is the set of users in the expansion list with top k users, H is the hold out set of users, and R is a randomly selected set of users from MAUs having the same size as H. We further average recall@k and precision@k over multiple ks to find out the performance of a method given different target expansion sizes.

We compared results from all approaches to better understand the strengths of the combined approach. In this evaluation, the regression-based models are implemented using logistic regression models that are trained on raw features. Similarity-based models are based on locality sensitive hashing (LSH). In all results, we used regression-based models as the baseline and showed relative gains compared to the baseline. Numbers in Figure 4 show that the combined NN approach outperforms both regression and similarity-based approaches in terms of both recall@k and precision@k.

Figure 4. Overall comparisons among regression-based, similarity-based, and combined NN models (Baseline: regression-based).

In addition to the overall comparison, we also evaluated models for different seed list sizes. Specifically, we grouped advertisers into five groups according to which bucket its seed list size falls into: several thousands, tens of thousands, hundreds of thousands, several millions, tens of millions. Numbers in Figure 5 and Figure 6 tell us that the regression-based and similarly-based models only perform relatively well on large and small size seed lists, respectively. However, our proposed combined NN model outperforms both models across different seed list sizes. For the small seed lists, the combined model as well as the similarity-based model both benefit from the pre-trained universal user embeddings and thus avoid overfitting. The result is surprising for the large seed lists because to train the combined model we only use up to 200,000 positive samples, whereas for the regression-based model we use all the positive samples, i.e. million of seed users. We believe using the pre-trained user embeddings as input also allows the model to converge faster. In addition, the MLP layers are able to capture nonlinear relationships and further boost the combined NN model’s performance.

Figure 5. Average recall@k for different seed list sizes among regression-based, similarity-based, and combined NN models (Baseline: regression-based).
Figure 6. Average precision@k for different seed list sizes among regression-based, similarity-based, and combined NN models (Baseline: regression-based).

Because the regression and similarity-based models compliment each other well, our previous version of the AAL system was a hybrid solution that blends expansions of the classifier and similarity-based models. For the new version, we productionized the proposed combined model to drastically simplify the AAL system complexity. As a result, we save both infrastructure and maintenance costs, and we speed up end-to-end runtime by more than 20%. We use Spark and Kubernetes to scale our system to support regularly trained models for seed lists and scoring users against each seed list (seed lists and users are in the scale of O(10⁵) and O(10⁸), respectively).

In our online A/B test, the blended system (blending classifier and similarity-based models) and combined NN models are control and treatment candidates, respectively. Figure 7 demonstrates that during our two week online experiment, for the AAL ads we observed statistically significant gains on users with ads impressions and users with revenue, which led to gains on AAL ads impressions and revenue (3.6% and 3.1%, respectively). While we observed a 5.7% drop on CTR for the AAL ads, we also saw an 8.3% drop on hide rate (HDR) and 2.7% gain on good click ratio (GCR) , which indicates the quality of user engagement with AAL ads improved. We theorize that these metric movements are consistent with our offline evaluation: the combined NN models’ much improved recall translates to better user reach during the online experiment, which leads to more users with ads impressions and revenue.

Figure 7. Online A/B Test Results for Act-alike Ads (Baseline: Blended)

Conclusion

To discover high-quality actalike users for audience expansion targeting, we use the pre-trained universal user embeddings and build neural network classifiers for each seed list. This approach outperforms both the traditional regression-based models trained on raw features and similarity-based models based on the user embeddings, in terms of both recall and precision. We also use historical user engagement metrics to further boost the actalike expansion online metrics.

Future Work

The unification of the AAL system paves the way for many future AAL projects. At the time of writing, there have already been launches that further improve system efficiency and infrastructure cost. The team is working on more deep modeling structures such as Deep Factorization Machine and contextual AAL to achieve higher quality AAL audience expansions.

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog, and visit our Pinterest Labs site. To view and apply to open opportunities, visit our Careers page.

Acknowledgements

We would like to thank Paul Nunez, Jacob Gao, Scott Zou, Chongyuan Xiang, Xing Wei, and Stephanie deWet for their contributions. We also thank Longbin Chen, Aaron Yuen, Sean McCurdy, Mao Ye, and Roelof van Zwol for the leadership.

References

[1] Ma, Qiang & Wagh, Eeshan & Wen, Jiayi & Xia, Zhen & Ormandi, Robert & Chen, Datong. (2016). Score Look-Alike Audiences. 647–654. 10.1109/ICDMW.2016.0097.

[2] Doan, Khoa & Yadav, Pranjul & Reddy, Chandan. (2019). Adversarial Factorization Autoencoder for Look-alike Modeling. 2803–2812. 10.1145/3357384.3357807.

[3] Liu, Haishan & Pardoe, David & Liu, Kun & Thakur, Manoj & Cao, Frank & Li, Chongzhe. (2016). Audience Expansion for Online Social Network Advertising. 165–174. 10.1145/2939672.2939680.

[4] Jiang, J., Lin, X., Yao, J., & Lu, H. (2019). Comprehensive Audience Expansion based on End-to-End Neural Prediction. In J. Degenhardt, S. Kallumadi, U. Porwal, & A. Trotman (Eds.), Proceedings of the SIGIR 2019 Workshop on eCommerce, co-located with the 42st International ACM SIGIR Conference on Research and Development in Information Retrieval, eCom@SIGIR 2019, Paris, France, July 25, 2019 (Vol. 2410).

[5] Liu, Yudan & Ge, Kaikai & Zhang, Xu & Lin, Leyu. (2019). Real-time Attention Based Look-alike Model for Recommender System. 2765–2773. 10.1145/3292500.3330707.

[6] deWet, Stephanie & Ou, Jiafan. (2019). Finding Users Who Act Alike: Transfer Learning for Expanding Advertiser Audiences. 2251–2259. 10.1145/3292500.3330714.

Pinterest Engineering Blog

Inventive engineers building the first visual discovery engine, 200 billion ideas and counting.