Embarking on the Snowflake Data Clean Room

Alex Lei
6 min readJul 8, 2022

--

So, what is a data clean room?

A data clean room is defined by some as:

A data clean room is an online platform where companies like Google, Amazon, and Disney can safely share data with advertisers without violating user privacy. They accomplish this by sharing aggregated data — or data that has been organized into groups or cohorts — rather than individual customer data. — Tinuiti

However, the Snowflake data clean room goes beyond this initial use case (although it can be used in this way). If we take a step back, the idea is more accurately defined as:

Data Clean Rooms (DCRs) are secure environments that enable multiple organizations (or divisions of an organization) to bring data together for joint analysis under defined guidelines and restrictions that keep the data secure. These guidelines control what data comes into the clean room, how the data within the clean room can be joined with other data in the clean room, the kinds of analytics that can be performed on the clean room data, and what data — if any — can leave the clean room environment.

What are some of the reasons why you would need a DCR?

  • Customers want to gain insight into how an external partner have the same customers but are accessing different goods and services.
  • Security and governance measures around data continues to increase. This is especially true for customers working with personally identifiable information (PII)
  • Technologies which were traditionally used to identify unique customers are now being retired due to security concerns (e.g., 3rd party cookies)
  • Market shift from traditional advertising media to more digital formats. Brings in new challenges for identification.

However, more and more customers on the Snowflake data cloud are finding more uses for this way of gaining insights, such as:

  • Organisation with information about infrastructure (such as architectural drawings, schematics, etc.) Can share this with other organisation in a secure way.
  • Government entities can share amongst themselves in a seamless way and can share information to private enterprises in a similar fashion.
  • Data scientists can enrich their datasets with anonymous curated data sources, providing better and more accurate machine learning models than ever before.

What are the components that make up a DCR solution?

More importantly, what should you be looking for in a DCR solution? This is probably the number 1 thing that needs to be determined even prior to creating the DCR framework. Like Neo in the Matrix, the question?

Photo by: https://screenrant.com/matrix-reloaded-architect-speech-choice-explained/

Customer of Snowflake have discovered that they are able to look at the following types of clean room scenarios.

They include:

  • Customer audience matching
  • Customer enrichment
  • Pre and post campaign analysis

Customer matching

How are you looking to match your customers with the external party?

There are several ways, from traditional match methods, such as a simple hashed email match, to more complex matching algorithms, such as waterfall (i.e., Email, name, address, phone numbers or combinations of these.

Snowflake allow you to easily create matching scenarios based on SQL based queries. A very simple example of this is a join on a customer table based on hashed email address.

SELECT field1, field2FROM party1.customersINNER JOIN party2.customersON party1.hashed_email_address = party2.hashed_email_address;

The benefit of this method of matching, is that this can be configured to be as complex as required.

Approved queries (templates)

Once you have confirmed the list of What type of queries will you request/allow the third part to submit into your customer data?

In the above select statement, this can be done in several ways, including:

  • Dynamically requesting for the fields required via the use of a template
  • An approval system, whereby the templates are checked from a list of ‘approved queries’ which are allowed to be executed.

Request/Response system

How would the system respond? Are you waiting for a manual confirmation? Is there human interaction in this process? Will this be automated?

These are some of the things that need to be determined when you choose a particular solution. The benefit of using the Snowflake data clean room is that a lot of these underlying mechanics are already done for you.

For example, by using streams and tasks, new requests from the advertiser can be automatically checked and validated. Once confirmed, this will automatically run on the publisher’s account. Making this end-to-end interaction seamless.

What makes the Snowflake data clean room unique?

It’s the combination of all these factors that allow the Snowflake data cleanroom to function, with the zero movement of data, that makes this a truly unique solution.

The Snowflake data clean room leverages existing functionality within the Snowflake environment. These makes it really easy to implement. In fact, there is a quickstart guide to show you just how easy it is.

DCR in action

The features that are used to enable the data clean room include:

  • Secure data sharing — to enable the sharing of information between parties without data movement in a secure way.
  • Row access policies — used to match customer information without exposing PII.
  • Stored procedures-to validate that the queries are authorised.
  • Streams and tasks — automate and monitor requests

Global reach — the Snowflake Snowgrid (Snowflake’s cross-cloud, cross-region capability) means that you can access almost any data source. It also means you can create multi-party clean rooms.

What information do you need to make sure you’re successful in a DCR deployment?

Here are some things to consider to ensure that you maximise your investment in a data clean room framework.

  • Have a good idea of the types of questions you want answered. The data clean room framework is very flexible, meaning it has the potential to cause ambiguity in the answers provided. Having a clear sense of what you want is necessary for a successful outcome. For example, is the clean room designed for customer enrichment? If so, knowing what columns provide the most value is important.
  • Start with an idea of how you want to protect your sensitive data.
  • Think about the way you want to match the external party records, is it via an email? Name, address, etc.
  • Work with our professional services team to map out the use cases from end-to-end.

What’s next?

One thing that’s true at Snowflake, is that innovation never stops. This is also true for the data clean room. At the Snowflake Summit (June 2022), Some of the announcements around new developments in clean rooms include the expansion to the ‘global clean room’.

Shakhina Pulatova (Principle Product Manager) and Justin Langseth (Technical Director) both presented on the global data clean room in the data collaboration session keynote.

Some of the key announcements of Snowflake’s future developments include:

  • Our use of the data sharing platform to share applications on our Snowflake Marketplace.
  • The deployment of the global clean room as part of a shared application.
  • New security features such as projection constraints, that allows a column to be used in a join, aggregate and where clauses, but block the projection (i.e., stops the results of the query from being displayed)
  • Use of differential privacy techniques to reduce the chance that individual records can be re-identified.
  • Did Shakhina just call Justin a ‘data boomer’? and what’s the tomato soup pipeline incident of 2007? So many questions!

Customer success with DCR?

The Snowflake media data cloud enables Disney Advertising Sales’ Innovative Clean room data solution. (See the full story here.)

NBCUniversal— read how the audience insights hub is helping NBCU unlock data interoperability between NBCUniversal and its advertising ecosystem partners. (Full story here.)

--

--