Criteo’s Privacy Sandbox Market Testing

ELIAS SELMAN

Published in

Criteo R&D Blog

16 min readJun 27, 2024

TL;DR

Criteo implemented a rigorous Market Testing methodology to comprehensively understand the potential impact on publisher monetization if third-party cookies on Chrome were deprecated today.
We observed that if third-party cookie deprecation was rolled out today, with the current version of the Privacy Sandbox we can expect publisher revenues in Chrome to decrease by an average of 60% for those that have fully integrated the Privacy Sandbox. If we consider the current level of supply integration in the environment, the average drop in publisher revenues goes up to 78%.
We highlight four methodological limitations and biases to the CMA and the industry that, if not mitigated, can distort the test results in favour of the treatment population, devoid of third-party cookies.

1 Introduction

Criteo has been an active participant in testing Google’s Privacy Sandbox, contributing to forums such as W3C, presenting early analysis both publicly and privately, and providing continuous feedback to enhance the utility of proposed APIs while ensuring privacy preservation. Our early implementation of key Privacy Sandbox components, including Protected Audiences API, Topics API, and Attribution Reporting API, has allowed us to conduct early testing on live traffic, through Google’s “Origin Trials”. However, the inherent limitations of testing Privacy Sandbox solutions in an environment where third-party cookies still exist did not allow us to draw conclusive assessments of the performance and business impact of third-party cookie deprecation in Chrome.

Therefore, we welcomed the industry-wide test initiated by the CMA and Google, providing a unique opportunity to assess the impact of third-party cookie deprecation and evaluate the Privacy Sandbox capabilities and our implementation in an online experimentation to better understand its effectiveness.

We have implemented Privacy Sandbox APIs, with a special focus on Protected Audience API, adapting our product architecture and performance engines to optimize for this new environment.

2 Criteo’s Methodology

Criteo implemented a rigorous Market Testing methodology to comprehensively understand the potential impact on publisher monetization if third-party cookies on Chrome were deprecated today. This methodology also considers the implications of losing access to other identifiers that Chrome has publicly indicated it may block in the future, such as IP addresses and emails.

2.1 Scope of Test

In general terms we followed this rule to determine our testing scope:

In Scope of Market Testing: The test includes all elements reliant on Chrome or other external entities (i.e. SSPs) for integration, such as the lack of native inventory integration and un-integrated Protected Audience API supply. We deemed it essential to include these aspects as they accurately reflect the present state of the Privacy Sandbox and exert an influence on its efficacy.
Out of Scope of Market Testing: To ensure accurate overall results, the test did NOT include components that Criteo has not yet integrated with Protected Audience API, such as un-integrated advertisers and product offerings.

Our Market Testing was conducted at scale, across more than 80% of Criteo’s client campaigns, increasing statistical significance due to the reach of our network of 18,000 advertiser clients, over 1,200 premium publishers and approximately 100 million weekly impressions. Criteo’s scope can be summarized by the following table:

It is worth highlighting that Criteo decided not to scope on only Protected Audience API enabled supply, ensuring the final results considered the impact of limited supply access that we currently observe due to lack of publisher Privacy Sandbox adoption.We also note that scoping only on Google Ad Manager (GAM) or any individual SSP, though tempting, can lead to important bias in test results (see section 5.2).

2.1 Testing Populations

2.1.1 Population definition:

Criteo followed the CMA’s proposed design 2 for the selection of the testing populations. The different populations, including labels and treatment are summarized in the Table 2:

Table 2: Summary of Criteo’s Testing Populations

2.1.2 Testing Label Propagation

Accessing Chrome testing labels depended in great part on SSPs forwarding them on the bid request they send to Criteo. There was a strong mismatch between the share of labels in bid request we observed, and the size of each population as provided by Google. That mismatch, if not properly handled, could introduce a significant bias in the test (See section 5.3). To mitigate this, Criteo employed three different mechanisms to retrieve users’ labels and associate them to the correct testing population:

Labels found on bid request: Criteo used the labels forwarded by SSPs on the bid request.
Label storage: In Mode A, we store the label associated with a user ID (contained within the third-party cookie) upon observing the user on an advertiser’s website. When responding to a bid request lacking label information, we refer to this storage to retrieve the previously stored label. This method is applicable only to traffic with third-party cookies and is close to 100% effective in mitigating the bias. This approach allowed us to deduce that 45% of spending was unlabelled in Mode A.
At display time: During ad rendering, there are instances where we can retrieve the user’s label. This method can be applied in both Mode A and Mode B. However, its effectiveness is approximately 85%, primarily due to unsupported publisher configurations. This approach enabled us to estimate that 17% of spending was unlabelled in Mode B.

2.1.3 Adjusting Metrics to Population Size

As with any AB test involving unbalanced population sizes, it is essential to adjust observed metrics to the respective population size to make them comparable. In contexts where users have unique identifiers (such as third-party cookies), calculating the observed population size is relatively straightforward. However, this is not the case for Mode B populations.

Using the number of bid requests as a proxy is not a reliable approach because there is no standard for how SSPs forward requests. Some SSPs send one request for a group of inventories rendered in the same page, while others do one auction for each banner. Additionally, the varying proportions in which SSPs forward labels further complexifies the estimation of the “true” population size.

Therefore, Criteo decided to use the theoretical proportion of Chrome browsers (information provided by Google) as the ground truth. Table 3 details how we used these values to normalize the observed metrics to the relative size of each testing population:

Table 3: Criteo’s population size regularization

2.3 Testing at Constant Performance

Advertising performance is typically measured as Return on Ad Spend (ROAS), but it ultimately depends on the outcomes that marketers aim to achieve, ranging from website visits to sales. From empirical knowledge of the programmatic open web advertisement market dynamics, we understand that there are diminishing returns between advertiser spend and the performance achieved (see Figure 1). This implies that as advertisers increase their budgets, their ROAS decreases because the most profitable opportunities were already bought.

Moreover, advertisers usually operate with fixed performance targets and adjust their marketing budgets across channels (open internet, walled-gardens, CTV, etc.) to maintain their desired performance levels. The real impact of third-party cookie deprecation on the Open Web publishers’ hinges on how advertisers will adjust their budgets in response to performance changes since they drive cash flow into the system and allocate it rationally among marketing channels.

We thus believe that enforcing constant performance across all populations (using Control 1 as a reference) is the only way to obtain reliable estimations of the long-term impact on the ecosystem (estimating the blue point in Figure 1). This method allows us to observe how much advertisers would need to spend in order to continue meeting their performance targets in this new post-third-party cookie environment, and therefore how much of their budgets would migrate to unimpacted channels like walled gardens.

Criteo’s approach consisted in building a feedback loop at the campaign level to match the performance of the Treatment and Control 2 groups to the Control 1 group. Due to the small population sizes, we used qualified visits as a proxy for performance across the board.

2.4 Focus on Publisher Revenues

Publisher Revenue at constant performance is the primary KPI we used to assess the long-term impact of a full third-party cookie deprecation rollout. Technically, Publisher Revenue at constant performance equates to Criteo’s Traffic Acquisition Cost (TAC) at constant performance. The conversion from TAC to publisher revenue is based on the following assumptions:

All publisher revenue streams are impacted in the same proportion as measured by Criteo.
This assumption is grounded in the fact that Criteo tested both upper-funnel and lower-funnel campaigns, and that all forms of digital advertising are optimized toward a performance KPI of some sort.
SSP fees remain equal.

2.5 Timeframe

Criteo’s stable testing period consisted of 8 consecutive weeks between March 18th and May 12th, 2024.

3 Results

The following section details the most important takeaways from our Market Testing results.

3.1 Global Results

If third-party cookie deprecation was rolled out today and replaced with the current version of the Privacy Sandbox, we can expect publisher revenues to decrease by up to 60% for publishers who have currently enabled Protected Audience API, and overall up to 78% across the entire publisher ecosystem. Nevertheless, The Privacy Sandbox brings some incremental value to online advertisement over purely contextual targeting as it exists today. We observe that the drop in publisher revenues for Control 2 over Control 1 is 95%.

Table 4: Criteo’s Global Market Testing Results

Results at Constant Supply:

As mentioned , the global drop of publisher revenues observed between the Treatment and Control 1 population is close to 78% . This value considers the fact that Protected Audience API as of today is enabled in only around 55% of the Chrome open-web inventory available through third-party cookies. We believe it is important to consider this impact, as there are no guarantees that the supply will catch up. Despite this, we have also studied how results would look if we only scoped on publishers that are currently enabled for Protected Audience API auctions, and we estimated a drop of 60% in publisher revenue between the Treatment Population and Control 1.

3.2 Other Important Takeaways

3.2.1 Spend by SSPs

Because Google Ad Manager (GAM) is the only SSP that could enable Protected Audience API at scale (Privacy Sandbox integration has shown to be significantly more costly and complex for other SSPs), we observe that GAM increases its captured share of publisher revenue by 360% between Treatment and Control 1. The share of spend funnelled through GAM for each population is:

Control 1: 23% of publisher revenue goes through GAM.
Control 2: 47% of publisher revenue goes through GAM.
Treatment: 83% of publisher revenue goes through GAM.

3.2.2 Latency Analysis

Latency is one of the biggest challenges of Ad Tech, and by extension, for publishers as it puts at risk their monetization opportunities and downgrades user experience. For clarity, latency is defined as the delay in time between the bid request and the display of the ad on the publisher.

We compared the latency distribution across all scopes and over different SSPs. The results shows that the median latency is 115% higher in Treatment compared to Control 1 in GAM, and 152% higher in other SSPs.

This is concerning, as higher latency means that DSPs might time-out at higher rates, and the user experience is worsened for publishers. In addition, publishers will lose monetization opportunities as time-out usually means rendering blank banners or exposing contextual ads that today are less valuable and generate less revenue.

In addition, we expect that as the Protected Audience API adoption increases, delays in client-side processing will also increase, which will in turn further increase latency.

4 Criteo’s View

Based on these results, we believe that the Privacy Sandbox does not currently ensure a properly functioning market. Our findings indicate that if third-party cookies were deprecated today:

Open-web publishers would lose their ability to effectively monetize their content and lose more than 3/5 of their Chrome advertising revenue.
Moreover, GAM would likely extend its market share and benefit from its dominant position in the Privacy Sandbox environment

Therefore, we consider that the criteria outlined in Google’s Commitments to the CMA, specifically “b. impact on competition in digital advertising” and “c. impact on publishers and advertisers,” are currently not met.

Despite these results, Criteo believes that if Google and the industry focus on resolving some key limitations of the Privacy Sandbox, there is a path forward to make the Privacy Sandbox a viable alternative to third-party cookies, achieving both goals of improved user privacy protection and preserving the Open Web.

5 Warning on Potential Bias on Tester Results

In the coming days a number of testers will communicate results either publicly or privately to the CMA . We acknowledge that each tester might have different testing methods or scopes that might lead to discrepancies in results. Nevertheless, beyond simple methodological differences, we have found potential biases that, if not mitigated, can lead to important misinterpretation of results. This section warns the industry of potential limitations that might be impacting other testers, skewing results in favour of the treatment population.

5.1 DSPs testing at constant spend biases SSPs market test results

Some DSP have chosen to test at constant spend (TAC) to evaluate the drop in performance for their advertisers after the rollout of third-party cookie deprecation in Chrome. However, testing at constant spend could severely bias the results observed by SSPs or publishers.

If DSPs test at constant spend, they equalize the TAC across all testing populations. Consequently, supply testers would see no difference in the revenue captured from that source of demand. This means that if those DSPs constitute a significant share of the SSPs’ or publishers’ revenue streams, they could incorrectly conclude that the rollout of third-party cookie deprecation will not result in significant drops in publisher revenue. Due to this potential bias, we advise the industry to exercise caution when analyzing results from supply-side testers.

5.2 Bias of scoping results on Google Ad Manager (GAM)

Numerous DSPs might be inclined to limit the test exclusively with GAM, as it is likely the only SSP operating extensively on Protected Audience API and forwarding labels at scale. However, this approach would skew the results, favouring the Treatment population by at least 100% relative to Control 1.

Table 7 shows that when focusing on GAM, we observe only a modest decrease of around 21% in publisher revenue between Treatment and Control 1. It is crucial not to interpret these figures as suggesting that if all SSPs improved their Privacy Sandbox integrations, drops in publisher revenue would be modest. The reality is that the relative advantage of Google’s own SSP integration with the Privacy Sandbox allows them to channel additional TAC, improving the perceived performance of the Privacy Sandbox. The -21% figure is attained because GAM wins auctions that it otherwise would not win due to its advantage in the Privacy Sandbox environment compared to SSPs competing for the exact same display opportunities (a phenomenon frequently called side-by-side). In other words, as our global results show, the overall size of the market shrinks, but because GAM captures a much larger share of the supply traffic, the impact seems smaller. The reality is that if all supply (publishers and SSPs) were to integrate to the Privacy Sandbox at scale — which is already an optimistic scenario today — the actual drop in publisher revenues would be closer to 60%.

In depth explanation:

The following series of Figures help us exemplify this bias with a simplified example. Figure 2 sets the stage, showing an environment with just two publishers, each funneling $50 of advertisement revenue in Control 1.

Figure 2: Example of Different Supply Scopes — (0) Setting the Stage

Publisher 1 is integrated through the Privacy Sandbox via GAM but also connected with a second SSP with no Protected Audience API integration. This means that all bid requests for this publisher are funneled through both SSPs in parallel, with the highest bid from each SSP competing in the final auction. This common phenomenon in the industry is known as side-by-side competition. We observe the following for Publisher 1:

In Control 1, on third-party cookie traffic, GAM and the other SSP share the revenue equally ($25 each).
In Treatment population, because GAM is integrated with the Protected Audience API, and Protected Audience API auctions tend to attract slightly higher bids than purely contextual advertisements, GAM cannibalises most of the side-by-side competition. As a result, Publisher 1 generates $20 through GAM in the Treatment population and only a residual ε through the other SSP.

The situation is simpler in Publisher 2. They are only connected with 1 SSP that is not integrated to the Protected Audience API, therefore:

On Control 1, the exclusive SSP funnels $50 of revenue.
In the Treatment population, because the publisher relies solely on contextual targeting, it only generates 2ε of revenue.

Given this situation, there are three options on how market testers can scope results:

1. Scoping on GAM: As shown in Figure 3, if a tester scopes the results on GAM only, they observe a 20% decrease in publisher revenues between Control 1 and Treatment population. However, this does not reflect the real scenario for either publisher. The problem is that the variation in GAM revenues combines two opposing effects:

a. GAM lowers the value of its original traffic from $25 to $10. Which is the effect we are trying to measure.

b. Secondly, because of its competitive advantage on Protected Audience API integration, GAM also captures a larger share of traffic by cannibalizing from the other SSP, generating an additional $10.

Because measuring only the first effect is very complex in practice, scoping on GAM does not accurately measure the actual impact on publishers, but rather a mixed effect. Therefore, this measurement scope is biased and must be avoided, as it does not reflect the post-third-party cookie deprecation reality that we are trying to measure for neither Publisher 1 nor the environment as a whole.

Figure 3: Example of Different Supply Scopes — (1) Scoping on GAM

2. Scoping on Protected Audience API Integrated Publishers: This option, illustrated in Figure 4, considers only measuring the shift in publisher revenues observed for Publisher 1. Here, we would use the $50 that the publisher generated in Control 1 as benchmark and conclude that the publisher revenue decreases by 60%. While this approach is methodologically correct, it assumes that all publishers will eventually integrate with Protected Audience API, which we currently should not take for granted. Moreover, it is challenging to completely isolate publishers integrated with Protected Audience API, as many have only partial coverage of their inventories enabled.

Figure 4: Example of Different Supply Scopes — (2) Scoping on Protected Audience API enabled supply

3. Not Scoping on Supply: The final option shown in Figure 5, evaluates the impact on the entire environment, regardless of integration with Protected Audience API. This approach simplifies the analysis and provides a realistic measure of the overall impact on publishers if third-party cookies deprecation occurred today. It reflects the true market dynamics and the potential revenue changes for the supply environment as a whole. Here we observe that publisher revenues fall by 80%.

Figure 5: Example of Different Supply Scopes — (3) No supply scoping

5.3 Lack of Label Forwarding from SSPs

Label forwarding posed a significant challenge throughout the test period, as many SSPs either failed to forward testing labels entirely or forwarded significantly fewer than anticipated. While one might assume that this bias affected all testing populations equally, our analysis shows otherwise. Approximately 17% of the Treatment population’s spending occurred through unlabeled traffic, compared to 45% for Control 1. The main reason behind this difference is that while unlabeled traffic in Control 1 is usually valuable third-party cookie traffic, for the Treatment population it most likely comes from SSPs or publishers that do not enable Protected Audience API. This means that unlabeled traffic in the Treatment population is addressed purely through contextual targeting, which currently brings less value. Ultimately, this difference in value between unlabeled traffic, if left unaccounted for, could artificially inflate publisher revenue by 50% for the Treatment group relative to Control 1.

Figure 5: label vs unlabeled spend on Treatment and Control populations

As detailed in section 2.2.2, Criteo implemented two mechanisms to mitigate this bias, allowing us to determine the share of spend from each population that was funneled through traffic where SSPs did not share a testing label.

We caution the industry about the significance of this bias and emphasize the importance of implementing mitigation strategies, or at least, take this effect into consideration at the moment of extracting conclusion from the results.

5.4 Third Party Cookies ids in bid request for Mode B users

Mode B has been designed to provide a population without cross-site IDs stored in third-party cookies. Nevertheless, during our test period, we observed that, without mitigations, both the Treatment and Control 2 populations had a significant share of publisher revenue derived from traffic where SSPs forward a third-party cookie ID, ranging from 50% to 30% of the total publisher revenue of the Treatment population.

After discussions with Chrome and the industry, we have gathered the following hypotheses as the potential sources of this traffic:

Users reactivating TPCs.
Publishers opting out of third-party cookies deprecation.
Residual cookie matching, with Criteo IDs having been synced with publishers’ first-party IDs by SSPs.

Despite this, Google has clearly communicated its intention to remove third-party cookies for web advertising use cases. Therefore, we deemed that for an adequate testing, third-party cookie traffic should not be included in the test.

If not mitigated, the remainder third-party cookies in mode B, can increase Treatment population publisher revenue by at least 100% relative Control 1. To eliminate the mentioned bias, Criteo avoided using third-party cookie signals and forced a purely contextual bid whenever we receive a Treatment or Control 2 labelled request. Additionally, in cases where the request is unlabelled, but we retrieve a treatment label at display time, we retrospectively excluded third-party cookie displays from the treatment population for the results interpretation.

Ultimately, the Privacy Sandbox Market Testing was full of inherent complexities, but we are confident that our rigorous methodology makes Criteo’s results trustworthy. Despite the grim outcomes for publishers that we observed, Criteo believes there is a path forward to make the Privacy Sandbox a viable solution for a functioning market.

Check all our articles about the Privacy Sandbox 👇https://medium.com/criteo-engineering/tagged/privacy-sandbox