Data Clean Rooms: the Go-To Product in a Privacy-First Reality

Alexis Sukrieh
Weborama
Published in
10 min readSep 22, 2023

The “IDLess” World is already there

It’s been clear for some time: Browsers like Safari and Firefox have already abandoned third-party cookies. Now, Chrome, with its commanding 60% market share, is following suit, setting its sights on the Privacy Sandbox implementation by 2024. Once fully activated, we could observe an overwhelming 97% of web traffic without cross-site identification capabilities. Coupled with GDPR implications, the ‘identifiable inventory’ is even smaller in Europe. This isn’t a distant prediction; it’s today’s marketing environment.

We’re witnessing a paradigm shift: we’re transitioning from an individual-based marketing to a cohort-based approach. This is major and redefines the way we should see our industry if we want to successfully adapt to this new game.

First-Party Data and Data Clean Rooms (DCRs) are emerging as the key assets in this new playbook.

The Decline of Cross-Context Identifiers

In recent years, changes in digital marketing have largely been influenced by the reduction of cross-context identifiers. Among these, third-party cookies have been the most affected due to browser updates:

  • Safari (19.85%): Apple’s Safari browser, which holds a significant portion of users, started the trend in 2018. Its Intelligent Tracking Prevention (ITP) limited tracking capabilities by blocking third-party cookies and reducing the lifespan of first-party cookies.
  • Firefox (2.94%): Despite its smaller market share, Mozilla’s Firefox made changes in 2022. Through its Enhanced Tracking Protection (ETP) and Total Cookie Protection it began blocking third-party cookies and restricting fingerprinting techniques.
  • Chrome (63.56%): Google Chrome, holding the lion’s share of the browser market, has also taken a major step towards privacy. It announced its Privacy Sandbox and plans a full implementation by 2024, which will stop cross-site cookies from functioning on the most used browser on the planet.

Browser Market Share Source: gstats.com (year 2022)

Currently, 23% of digital inventory is cookieless, mainly due to actions taken by Safari and Firefox. Once Chrome activates its Privacy Sandbox (set for a 2024 rollout), this will likely jump to 87% of all traffic.

A Decline Amplified by GDPR and Browsing Practices

Beyond browser changes, a research has highlighted GDPR’s effects on website visits and engagement. Analyzing 5,000 domains in Europe and the US, the study found that GDPR’s requirement for clear user consent before data collection means 15% fewer third-party cookies are set due to users not providing consent.

Hence, at the time of this writing, about 38% (23% + 15%) of web traffic cannot be addressed with cookies. But with Chrome’s full Privacy Sandbox adoption, this could rise to an estimated 97% of web traffic being without cookies or user consent, making cross-site identification basically a thing of the past.

Keep in mind that these figures represent a (very) conservative estimate. In practice, a sizable number of Chrome users activate Private Mode or manually disable third-party cookies in their settings, further reducing the traffic advertisers can target. This can potentially double the estimated percentage of unaddressable web traffic, leaving as little as 30% of the open web accessible to advertisers.

Let me emphasize: This isn’t a hypothetical scenario for years down the line; it’s the very reality digital marketers are facing today.

Further Setbacks: The Fall of Mobile and Universal IDs

In addition to the decline of third-party cookies and the rise of consentless traffic, alternative ID solutions face significant hurdles. Measures like IP masking, the freezing of User Agents, detection of bounce trackers, and the limitations of browser fingerprinting techniques are undermining their effectiveness, making them very likely obsolete in the months to come.

Last but not least, Mobile IDs are also facing deprecation. Apple’s ATT (2021) has significantly limited their use, and Google’s upcoming release of a Privacy Sandbox for Android signals the killing of the widespread use of GAID.

Data Collaboration and the Emergence of Data Clean Rooms

In a landscape where recognizing individuals across multiple websites or apps becomes increasingly challenging, as we just saw, businesses must adapt their strategies for audience engagement. Their only viable option is to build these strategies on their own digital assets, leveraging their First-Party data.

Why? Because with the disappearance of third-party cookies and fingerprinting techniques (or mobile IDs), First-Party data emerges as the sole dependable source for understanding and engaging with users. This data, collected directly from customer interactions on a brand’s properties, offers valuable insights into user behaviors, preferences, and interests.

However, First-Party data is not a silver-bullet, it comes with two main issues to address:

  1. Scarcity: Emails, when it comes to logged traffic, are relatively rare. They account for just a small portion of total traffic, with estimates suggesting they make up 5% to 15% of traffic on average, depending on the domain.
  2. Sensitivity: Emails are highly sensitive from a privacy perspective as they are classified as Personally Identifiable Information (PII). Therefore, they cannot be used as a new cross-domain ID without violating user privacy regulations and risking potential legal repercussions.

To address these challenges, Data Clean Rooms (DCRs) come into play. DCRs are specialized tools designed to facilitate collaboration while safeguarding user privacy. They enable organizations to work together on aggregated intersections of First-Party data without exposing or exporting PII, such as email addresses.

The Data Clean Room: the go-to product in the First-Party realm

Understanding Data Clean Rooms

DCRs are not just a buzzword or a guise for “privacy-washing” — they’re a concrete, security-focused product solution aimed at preserving user privacy while enabling businesses to leverage their data effectively.

A Data Clean Room is a secure environment that allows for the controlled sharing and processing of data, ensuring that no PII is exposed or exported.

The Golden Rule: No PII Outflow

The golden rule for a DCR is that it should never allow PII, such as IDs, email addresses, or any PII-related information, to exit the environment or be displayed. Failure to adhere to this rule disqualifies a system as a DCR.

Now, to really hit the mark, a DCR needs certain key features. Here are the top five must-haves for any solid DCR.

Data Fusion Without PII Exposure

At their core, DCRs have the capability to merge two or more First-Party data sets based on a common identifier, such as email addresses. However, as explained below, these common identifiers (or pivots) remain hidden and are never exported out of the DCR, ensuring that while data sets are combined, no individual-level data is ever exposed or exported.

Aggregated Segments

In DCRs, queries always yield grouped results rather than individual records. This ensures that participants in a DCR can gain general insights without targeting specific users. A foundational element to this is the k-anonymity principle. It ensures results from queries always come in groups of a minimum size, not as individual records or small groups of individuals. This way, data remains generalized enough to protect user identities.

No PII Exposure or Export

Given that DCRs operate on aggregated data segments, they inherently prevent the exposure of PII. Since all queries are executed on these aggregated segments, no PII can ever be returned by design.

To maintain strict privacy standards, it’s essential that DCRs prevent any PII from leaving the platform. This stops participants from matching results with external data sets, which would violate core privacy guidelines.

Controlled Access

DCRs incorporate strict user permission systems and advanced security measures, ensuring that only approved individuals can access specific datasets, and always under well-defined conditions.

For instance, some queries within a DCR might have a predetermined expiration, meaning they become inaccessible after a set number of days. Also, to protect the data and maintain consistent results, the information used to run a query can be locked or set in an “immutable” state. This means that even if a participating party updates their data, the original query results remain consistent. This immutability is crucial as it prevents potential data triangulation attacks by a malicious participant in the DCR.

Data Source Verification

DCRs prioritize data from verified and trusted sources. Before data is integrated into the system, it undergoes preprocessing. This step involves potentially identifying and flagging any columns as containing PII, or selecting specific columns to be accessible within the DCR. Participants also have the authority to stipulate conditions, like predefined query templates, under which their datasets can be accessed.

This focus on data integrity ensures that businesses can confidently derive insights from legitimate data while maintaining the protection and discretion of each participant’s contributed data.

DCR Use Cases, how to do things in this new era?

There are plenty of potential use cases for DCRs, each offering strong opportunities to extract value while staying within the boundaries of privacy regulations. A deep dive into these use cases would deserve a dedicated article, but a brief exploration can highlight how things will be done in this Privacy-First era.

Insights

Consider a retail brand aiming to decode the preferences of their loyal customers without overstepping privacy boundaries. Through DCRs, this brand has the capability to merge its first-party data with that of a partner, say, a publisher boasting a sizable audience.

Utilizing logged-traffic as a basis, the overlapping of these datasets could unveil trends, such as segments of customers who are more inclined to make a purchase based on the content they engage with on the publisher’s sites.

Armed with these insights (and to stress, these are 100% free of PII), the brand is able to make informed inventory decisions, adjust its marketing approach, or perhaps roll out new product lines.

Ad Campaign Measurement

In a digital landscape where tracking pixels and cookies are gradually fading, determining the efficacy of ad campaigns is more and more challenging.

The once straightforward process of connecting users from ad impression to the ultimate conversion now encounters heavy obstacles. This is where DCRs, leveraging logged traffic data, come into play.

Take the example of an e-commerce platform launching a campaign for a fresh product lineup. With DCRs, they’re enabled to analyze the logged traffic post-ad display and compute conversion metrics, leveraging the First-Party data of big publishers broadcasting their advertisements. This way, it’s now possible to measure the performance of a campaign — even without pinpointing who exactly is there — by comparing conversion rates of those who were exposed to the ad to those who were not.

Activation

How can one activate a DCR segment without exporting the PII?

This question is perhaps the most significant challenge in this new era. As it stands, the answer remains open-ended.

Notably, the IAB’s white paper titled “Open Private Join and Activation” delves deeply into this issue. The suggested approach is to export the PII intended for targeting, but ensuring it’s encrypted throughout the supply-chain. While promising, it’s essential to realize that this method introduces a certain level of complexity. Each participant in the process would need to adopt this encryption for it to be effective.

Alternatively, DCR vendors might offer an activation service that operates in real-time. Upon receiving an encrypted PII-application key pair, it would then return a set of approved DCR segment IDs. This method might be easier to set up. Moreover, it could incorporate a system similar to the Privacy Sandbox’s TOPICS API, occasionally adding false segments. This slight distortion could enhance privacy by making the segment a bit more blurry.

Data Scarcity: the Key Challenge of all DCR Use Cases

A crucial factor to bear in mind when leveraging DCRs is the limitation of available first-party data.

With Machine Learning and Contextual targeting, we have two essential tools in addressing this challenge.

Machine Learning

Consider an advertiser aiming to broaden its DCR segment, which is inherently limited due to it relying on the shared logged traffic with a Publisher partner.

Within the DCR tool, the advertiser can utilize machine learning algorithms that hinge on publisher signals. Leveraging the subset of users who engaged with their ads as positive examples (using shared logged traffic), they can identify similar users within the publisher’s data set who aren’t present in the advertiser’s own data, drawing from the publisher’s features.

Contextual Targeting

On the other hand, integrating Contextual segments with DCR proves to be highly effective. This method capitalizes on the semantic context related to the user in the DCR segment, enabling advertisers to target akin contextual areas across the web.

What stands out with this approach is its IDless nature. The reliance on logged-traffic is eliminated. This opens doors to a wider inventory spectrum, covering browsers like Safari and Firefox.

And here’s the kicker: no consent is required. There’s no use of PII, no pseudonymous IDs, and absolutely no trace of browsing history in signals used to target the ad. Targeting is completely user-agnostic, focusing solely on content.

Combining Top-Tier DCR Technology with User-Friendly No-Code Design

At Weborama, we’re rolling out a DCR app within our Data Intelligence Platform. I’m personally thrilled to be working on this innovative initiative with my team.

First, because it really matters to me (and to *a lot* of my colleagues) to commit to advancing the cause of privacy in our industry. We truly believe that the future of marketing has no choice but to meet high Privacy-First standards.

But this isn’t just about “meeting industry standards”, it’s about setting them.

Sure, adding a GDPR layer to an app might earn the ‘DCR’ label in some circles. But as I’ve pointed out in this piece, genuine DCRs involve so much more.

It’s a paradigm shift.

The online marketing landscape we once knew, built on individual data tracking, is coming to an end. This approach sometimes resulted in major data breaches and excessive tracking and retargeting, which logically raised public concern about privacy, as evidenced by incidents like Cambridge Analytica, to name the creepiest. Things are changing.

We’re transitioning to a world centered on cohorts, not individuals. With this shift comes enhanced privacy, but also the challenges of data scarcity and complexity.

DCRs? They’re the tools for this new era, guiding us through the maze where data is both precious and guarded. I’m glad to be both a witness to this evolution and, with Weborama’s offerings, a contributor to shaping it.

And secondly, what really motivates me in this journey, is that there’s another layer to our innovation: usability.

While most DCRs demand a strong grasp of SQL or similar technical skills, at Weborama we’re pioneering a new approach.

We’re layering our state-of-the-art DCR technology (based on the powerful Snowflake DCR framework) with an intuitive no-code interface, truly making it accessible without compromising on its robust capabilities.

Interested in what we’re crafting? Drop us a line. We’d love to chat.

--

--