How to create a Secure Audience Manager as a Snowflake Native App

A data clean room is a way for two or more brands to collaborate together on their first-party data without exposing data to each other. With the Snowflake Native App Framework, brands can create an app as one way to create a data clean room with each other. The framework allows one brand (A) to secure package data and code into an app, and then the other brand (B) can install this app into their Snowflake account. With this setup, Brand B can’t see Brand A’s data or code, and Brand A has no access to Brand B’s data.

Simplistic diagram showing how a Native App gets packaged and distributed — all data and code is protected!

In the example here, we’ll create a Secure Audience Manager app with Snowflake SQL, Snowpark Python, and Streamlit. Brand A is a publisher, and Brand B is an advertiser. The app itself, packaged and shared by the publisher, will help the advertiser build audiences to be activated on Brand A’s media. While the collaboration between the two will happen in the app’s clean room, the result of the collaboration will be an audience table that gets shared back from the advertiser to the publisher. This table will only contain the publisher’s identifier, which means it can’t be used for re-identification by the advertiser.

Packaging the Native App

I highly recommend looking at the Native App Framework Tutorial as a first step to familiarize yourself with the framework. For this example, I simply just used a copy of the TPCDS, or specifically, tpcds_sf10tcl.customer, as my customer table. Every Snowflake account should have access to this dataset (the database name that I use will likely be different than yours).

For reference, please see the GitHub repo here!

First, we’ll create the application package and package in the necessary data that needs to be shared in Snowflake — this can be done via the Snowflake UI:

-- copy data from TPCDS
CREATE DATABASE rmn_dev;
CREATE SCHEMA provider;
CREATE TABLE customer AS
select *
from sfsalesshared_sfc_samples.tpcds_sf10tcl.customer;

-- create app package
GRANT CREATE APPLICATION PACKAGE ON ACCOUNT TO ROLE accountadmin;
CREATE APPLICATION PACKAGE sam_app;
USE APPLICATION PACKAGE sam_app;
CREATE SCHEMA stage_content;

CREATE OR REPLACE STAGE sam_app.stage_content.app_stage
FILE_FORMAT = (TYPE = 'csv' FIELD_DELIMITER = '|' SKIP_HEADER = 1);

CREATE SCHEMA IF NOT EXISTS provider;
USE SCHEMA provider;

CREATE SECURE VIEW customer AS
select *
from rmn_dev.provider.customer;

GRANT USAGE ON SCHEMA provider TO SHARE IN APPLICATION PACKAGE sam_app;
GRANT REFERENCE_USAGE ON DATABASE rmn_dev TO SHARE IN APPLICATION PACKAGE sam_app;
GRANT SELECT ON VIEW provider.customer TO SHARE IN APPLICATION PACKAGE sam_app;

The secure back-end functions and app tables get created as the setup.sql file in the repo. Here’s a quick walkthrough:

  1. custom_audience and audience_metadata get created as a way to store the audiences and the metadata that the app creates.
  2. vw_audience_metadata is created to share back to the publisher, along with customer_audience. The view is created to make sure the publisher doesn’t see the SQL that created the audience.
  3. The UDTF with the packaged secure view is the secret to make the secure collaboration work — this is how the app securely joins both first-party datasets together.
  4. The stored procedures get triggered from button clicks on the front-end Streamlit app — one to run a customer overlap, the other to write audiences into the custom_audience table.

As you can see, the secure data gets packaged via Snowflake commands in the UI, while just about everything else is packaged in the setup.sql file. This is because any changes to the publisher’s customer dataset is reflected automatically to the advertiser, no app changes or patches need to be made for that to happen.

Creating and packaging Streamlit

I’m not a developer, and that’s why I love Streamlit. It makes it very easy to put a simple UI on top of whatever I want to build. My first pass at this was almost directly through chatGPT, and I honestly didn’t have to change too much to get it working with Streamlit in Snowflake (currently in Public Preview). I built almost the whole thing in Snowflake’s Streamlit editor, which made it very easy to test my code in real-time.

In my opinion, Streamlit should be mostly used for just the display, almost any action taken from Streamlit should be connected to a procedure or function that’s packaged into setup.sql — this should make future code changes easier. The Streamlit front-end is made up of a few major components:

Section 1 — Selecting the customer table and identifier

This is pretty standard in any UI that connects to Snowflake — the advertiser needs to point the app to access the right customer data. Ideally, there is a customer table or view that houses the full known customer list. Using this table, along with an identifier of your choice, the advertiser can find the overlap with the publisher. The identifier doesn’t have to be PII, it can also be a pseudonymous identifier, like RampID. Overall, the publisher and advertiser just need to agree on an ID to use.

After the ID is selected, you can run a simple overlap, and the publisher even has the option to display “standard audiences”, or popular audiences for that publisher where there is overlap. These can be things like “binge-watchers”, “shoe collectors”, or even “frequent upgraders” — there isn’t a limit for what can be displayed here, and something like this could be its own app (see Zeta Customer Growth Intelligence).

Section 2 — Creating a custom audience

In this app, you can specify three different audience types: a customer audience to target, a suppression list to exclude, or a lookalike seed audience. The user on the advertiser’s side needs to understand SQL to use this app as is. Once an update frequency is set, this audience definition is saved in audience_metadata, and the audience output is written to the customer_audience table.

Section 3 — Audience metadata display

In the last section, we display the audience metadata for the user to see the audiences that have been created, along with the count, SQL definition, and update frequency of each audience. I’ve not gotten to it yet, but eventually there will be a way to search, update, and delete records from this table. There will also be Snowflake Tasks that will continuously update the customer_audience table based on the frequency set by the user.

Creating the shareback

Both the vw_audience_metadata view and custom_audience table then need to get shared from the advertiser to the publisher. The easiest, and most efficient, way to do this is via a Private Listing. For now, this needs to be done via the Snowflake UI, but soon, Snowflake will be able to have the app itself create a shareback of this data to its provider. This share is what allows the publisher to incorporate the advertiser’s custom audiences into their platform. Another way to activate the data is through an activation partner — be sure to check out Snowflake’s latest Modern Marketing Data Stack to find a list of all our activation partners.

Other Considerations

If you’re using an external activation partner (CDP, Reverse-ETL, etc.), the native app provider should maybe give the option to instead return the advertiser’s specified ID, or more likely, a pseudonymous identifier, like RampID or UID2. This would allow the advertiser to use their external activation partner, instead of directly to the provider or the provider’s SSP. In this method, the app provider is likely more focused on enrichment use-cases, or helping to expand reach — not selling ad inventory. The most important thing here is for the advertiser to make sure they’re using the ID that they’re getting back in their impression data for measurement.

What if your data isn’t stored in Snowflake? That shouldn’t be a problem. Snowflake has made considerable progress in supporting open-source table formats, like Apache Iceberg, and it can also read data stored in External Tables. This means that neither the publisher nor the advertiser has to store their data in Snowflake to benefit from this app. While cross-cloud collaboration is definitely easier and more efficient today on Snowflake’s proprietary storage, using data external to Snowflake is also possible!

Conclusion and Next Steps

The solution above is a simple one that I spent a few hours on — if you’re looking for an enterprise clean room application on top of Snowflake, I suggest looking at our partners, Habu or Samooha.

Overall, if you’re a brand that is interested in data clean rooms for activation, reach out to your Snowflake account rep. If you don’t have Snowflake, get in touch with us!

--

--

Luke Ambrosetti
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Partner Solutions Engineer @ Snowflake. data apps + martech. sweet tea and fried chicken connoisseur. drummer’s syndrome survivor.