Debugging Ad Delivery At Pinterest

Pinterest Engineering

Published in

Pinterest Engineering Blog

7 min readJun 24, 2022

Nishant Roy | Engineering Manager, Ads Serving Platform

Phone screen with insights of impressions, total audience, engagement, and engaged audience: https://unsplash.com/photos/hOGKh5qHNAE

Intro & Background

The Pinterest ads serving platform delivered >$2.5 billion in ad spend in 2021 from thousands of advertisers. Our customer operations team receives 600+ tickets on average every month from advertisers who are looking to understand their performance on our platform. One of the most common questions we receive is why a particular advertiser/ad campaign is not fully utilizing its budget. This question requires a deep analysis of an ad recommendation system consisting of 5+ microservices, 1M+ lines of code, and 100+ active developers, serving >90 million requests everyday. This blog describes how we built a system to swiftly answer these questions without requiring deep technical expertise or context. We had three main goals:

Improve advertiser satisfaction by reducing the time taken to resolve their issues
Automate the data analysis and generate recommendations for the advertiser to improve delivery rate
Coverage of all components in the system (indexing, budgeting/pacing, candidate generation, ranking, ad funnel, auction, etc.)

Design/Challenges

Data Coverage

The majority of the data available for debugging was at a request level and heavily sampled to reduce costs, which meant we did not have sufficient data to understand system behavior for all advertisers. We needed extensive coverage for all advertisers, with counts at all stages of the ad recommendation system, without incurring huge logging and storage costs.

Solution: Aggregated Data Pipeline on Druid

After researching our options, we chose to use Druid as our storage solution for the following reasons:

High ingestion rates in realtime through Kafka: Each ad request may contain thousands of candidates, and we need to log data for all these IDs. We already extensively use Kafka for logging, so we were able to reuse a lot of our infrastructure with Druid. By leveraging Kafka with Druid, our pipeline makes data available in realtime for debugging with no more than a few minutes of delay.
Support for aggregation queries: Our expected query pattern is to retrieve counts for a certain ID during a given time period. Druid is optimized for time series data and provides great aggregation flexibility, so it meets our requirements.
Low-latency and storage cost: Druid allows us to provide real-time visibility into ID-level counts. Also, Druid supports aggregation at ingestion time, which minimizes the size of the data stored.

Here’s an example of our logging schema and a Druid response:

Logging Schema (sample) { TimestampMs CampaignID AdvertiserID TrimmedStage … } Druid Response (sample) // For CampaignID = 1234 { totalCounts: 500000, // total number of items logged within the time range idCounts: 20000, // total number of entries with this CampaignID within the time range results: [ { “stage”: “ad_relevance_stage”, “TrimmedCount”: 1500, }, { “stage”: “ad_budget_checker_stage”, “TrimmedCount” : 550, }, … ] }

With this data, we can now answer questions such as:

At what stage is a campaign mostly getting trimmed out?
What is the average trim rate for each stage?
How has the campaign’s trim rate or insertion rate changed day-over-day?

By answering these questions, we can quickly understand whether the poor ad delivery is caused by a bug in the system (e.g., a code change to a certain stage) or a bad campaign configuration (e.g., very low bid or budget).

Microservices

Our ad delivery system consists of several different systems, each with its own responsibilities (candidate generation, trimming, scoring, bid/budget management, indexing, content safety filtering, etc). The root cause for weak campaign performance may lie in any one of these systems, which are owned by multiple different teams at Pinterest, making it challenging to triage and resolve delivery issues.

Solution: Unified Debugging UI

To simplify the debugging process, we built a unified UI to visualize data logged from all systems involved in the ad delivery process. Users can now input a campaign ID and in just a few seconds, view detailed debug information across all systems. We can now easily answer questions such as:

Which stages had the highest trim rates?
Was there excessive content filtering?
Was the campaign not active in the index?
Were there changes to the budget?
Did the advertiser make any changes to their campaign settings?

Screenshot of Delivery Debugger UI showing a campaign’s spend summary and tabs for more details on different parts of the ad delivery system (retrieval, budgeting, etc.) — *Fig 1: Unified Debugging UI with tabs for each system in the ad delivery process*

Sample graph showing the trim counts of a given campaign at each stage of the ad funnel — *Fig 2: Trimmed counts by stage in the ad funnel*

Sample graph showing a campaign’s actual spend pattern, target spend, and daily budget — *Fig 3: Campaign spend against target spend and daily budget*

Sample graph showing how many pins from a campaign are active and filtered in the ad index — *Fig 4: Number of active vs filtered pins from a campaign in the ads index*

Accessibility to Non-Engineering teams

To make delivery debugging an easy and scalable process, we wanted to empower non-engineering teams (such as customer ops and sales) to understand our systems and what steps can be taken to improve ad delivery.

Solution: One-Line Summary Diagnosis and Recommendation

Our Delivery Debugger provides simple, one-line summaries of the health of the campaign, along with a recommendation on changes that can be made to improve ad delivery. These summaries do not require a deep understanding of our technical systems, making it easier and faster to resolve some delivery issues since they don’t require the help of engineering teams. The summaries are pre-written and are chosen based on heuristic rules that we manually identified based on prior experience and our knowledge of the ad delivery systems.

Example of the Delivery Debugger tool’s one-line summary explaining which stage (reserve price trimmer) is over-trimming and a suggestion for how to improve ad delivery (increase bid value) — *Fig 5: One-line summary showing which serving funnel stage may be limiting delivery*

Example of the Delivery Debugger tool’s one-line summary explaining which retrieval stage (dedup cap) is over-trimming and a suggestion for how to improve ad delivery (refine ad targeting criteria) — *Fig 6: One-line summary showing which retrieval stage may be limiting delivery*

Example of the Delivery Debugger tool’s one-line summary explaining that no particular funnel stages are overacting — *Fig 7: One-line summary showing that no stage is overacting, suggesting the problem lies in a different system*

In case more technical understanding is required, we also added detailed documentation about the end-to-end system, as well as each stage in the ad funnel, what it does, and recommended solutions if that particular stage is acting unexpectedly or undesirably.

Delivery Debugger overview diagram that users can access within the tool itself to get a more detailed understanding of each component in the ad delivery stack — *Fig 8: Delivery Debugger technical documentation*

Definitions of some of the trimmers in the ad funnel and what they do, along with proposed solutions. These are used to power the one-line summary tool mentioned above — *Fig 9: Definitions and recommended solutions for each stage in the ad delivery funnel*

Impact

In 2021, our customer ops team received a total of 7270 tickets, out of which only 78 needed to be escalated to the engineering team for further investigation. This means that 99% of all tickets were resolved without any support from the engineering team. Anecdotally, our partner on the customer ops team said that at least 60–70% of these tickets were resolved by their team independently thanks to the Delivery Debugger tool, and they rely on this tool on a daily basis.

Additionally, in 2021, it took on average 47 hours to propose a first solution (send analysis and recommendation to advertiser) and 67 hours for a full resolution.

Another clear indicator of the impact that Delivery Debugger has had is the raw volume of tickets that need to be escalated to the engineering team: 110 in 2019, 106 in 2020, and just 78 in 2021. As we continue to iterate on the tool, the need for dedicated engineering support is going down, allowing the team to spend more time on larger projects instead of debugging.

Finally, for the tickets that do get escalated to the engineering team (as shown in Fig 10), only 2% of bugs remain unresolved; from Fig 11, we have hit our fix SLA 90% of the time.

Our Delivery Debugger is now used by hundreds of Pinterest employees a month across both engineering and non-engineering teams to efficiently triage and resolve ad delivery bugs. We’re excited to continue making this tool more powerful to empower our teams to be even more effective in the future.

Pie chart of delivery bug ticket statuses showing that only 2% out of 278 remain unresolved — *Fig 10: Breakdown of ad delivery bug ticket status*

Dashboard showing we hit the Fix SLA for 90% of delivery bug tickets in the last 4 months — *Fig 11: Ad delivery bug fix SLA metrics*

Future Improvements

While we have added coverage for a lot of systems, there is still room for us to expand. Our next step will be to add extensive checks to triage budgeting issues and provide one-line summaries of how budgeting changes impacted delivery. After that, we plan to extend our tool to cover ads targeting configurations, to triage delivery issues caused by targeting specs that are too narrow or too wide. As our systems and product offerings get more complex, there is a continuous need to evolve our delivery debugger to maintain a high level of observability into our ad recommendation system.

Acknowledgements

This project is the result of the hard work from several engineers across multiple teams. A special thanks to Zeyu Wang who built this tool and the underlying framework over 3 years, and to Danyal Raza for taking over ownership and iterating to keep up with evolving business and product demands for the past year. Additionally, I would like to thank Sreshta Vijayaraghavan, Zack Drach, Alexander Rhee, Kenny Valdivia, Alejandro Iglesias, Mingsi Liu, Aniket Ketkar, Chengcheng Hu, Shawn Nguyen, Sameer Bhide, Andrei Curelea, Dan Xie, Haichen Liu, Tao Yang, Shu Zhang, Huiqing Zhou, Ang Xu, Filip Jaros, Weihong Wang, Yeming Shi, Keshava Subramanya, and many others on the engineering team for their contributions. I would also like to thank our partners on the customer ops team, Rivy Obinomen, Erika Martin, Katie Apple, and many others, for their continued support and feedback, to help improve the quality of the Delivery Debugger.

To learn more about engineering at Pinterest, check out the rest of our Engineering Blog, and visit our Pinterest Labs site. To view and apply to open opportunities, visit our Careers page