Snowflake Data Clean Rooms: Yao’s Millionaires’ Problem in SQL

Screenshot from the webinar

In the Secure Data Sharing with Data Clean Rooms in the Data Cloud webinar, Jennifer Belissent — Principal Data Strategist at Snowflake — presents the Data Clean Room concept with the famous Yao’s Millionaires’ problem:

How can two millionaires (Alice and Bob) determine whose net worth is higher without telling each other how much money they each have

I’ll not go deep into the concept of a Data Clean Room here, using direct data shares and private data exchange. I will only present the concrete implementation in SQL of this problem, as it has been only briefly described by Jennifer in her great presentation (see the previous image). And I will also highlight some potential issues with the suggested solution.

How Does It Work

Both Alice and Bob keep the information about their wealth in a private vault. This could be a private database table with one cell only the owner can access.

Bob (the provider) will create and expose to Alice (the consumer) a secure function, which compares two numeric values and returns as sole information which is greater. This secure function is owned by Bob but must be trusted as well by Alice.

Alice cannot see how the function is defined, so Bob is safe. But to be trusted by Alice as well, only she’ll be able to pass the right key to internally decrypt her wealth figure. Bob’s function will use this key to access Alice’s vault through a secure view.

Yao’s Millionaires’ Problem in Snowflake SQL (Copyright © Cristian Scutaru)

An alternative to the secure view is a secure function, created and owned by Alice, that will help decrypt the information she passed to Bob’s function. Another common alternative, when the information to be passed by Alice to the secure function is granular and limited in size, she could encrypt her wealth amount on her side and pass it as a parameter, with the private key. An even more secure scenario could also include a public key, whose value is known by both Alice and Bob, but that never travels between the two parts, on function calls or other types of handshakes. They will both store it somewhere safe and use it when encrypting (by Alice) or decrypting (in Bob’s function).

The problem with these implementations is that the sensitive information (i.e. the wealth amount) must travel between the two parts on each function call, even if it’s encrypted. A better alternative would be to pass just a hash (i.e. partial information), and a secure function on Alice’s side will help both retrieve and decrypt the full information from Bob’s function.

The boundary including Bob’s secure function and Alice’s secure view can be seen as a data clean room here, which can be trusted by both Bob and Alice. In practice, Alice and Bob are likely represented by different entities, from separate Snowflake accounts, connected through direct data shares. To make it even safer, Snowflake recommends using a private data exchange, which is a “distributed room” activated by the provider.

Create Databases and Roles

All the code snippets below can be copied and executed into a Worksheet, in the web UI of a (free) Snowflake account. We’ll simply create an alice_db and a bob_db database, with Alice and Bob roles. Alice will later create her own schema in alice_db, and Bob in bob_db:

use role sysadmin;create or replace database alice_db;
create or replace database bob_db;
use role securityadmin;create or replace role alice;
grant role alice to user cscutaru; -- change with your own!
grant usage on warehouse compute_wh to role alice;
grant usage on database alice_db to role alice;
grant usage on database bob_db to role alice;
grant create schema on database alice_db to role alice;
create or replace role bob;
grant role bob to user cscutaru; -- change with your own!
grant usage on warehouse compute_wh to role bob;
grant usage on database bob_db to role bob;
grant usage on database alice_db to role bob;
grant create schema on database bob_db to role bob;

Create Database Objects for Alice

Alice will save her $1,100,000 wealth figure on a private table, that she will keep just for herself. However, she will expose this encrypted value to Bob through a secure view, we will later see why:

use role alice;
use warehouse compute_wh;
use database alice_db;
create schema alice_db.data;create table alice_table as
select 1100000 as wealth;
create secure view alice_view as
select encrypt(wealth, 'alice secret') as wealth
from alice_table;
-- this returns Alice's wealth, but encrypted
select * from alice_view;
-- this properly decrypts Alice's wealth, as text
select to_varchar(decrypt(
encrypt(1100000, 'alice secret'), 'alice secret'), 'utf-8');
-- allow Bob to query the view as well, but only encrypted
-- (and he will never know the key!)
grant usage on schema alice_db.data to role bob;
grant select on view alice_view to role bob;

Create Database Objects for Bob

Bob saves his $1,250,000 wealth figure on a similar table. And he creates a secure function that compares his number with the decrypted number from Alice’s secure view. However, Bob will never call this function (as it will fail), it is for Alice. Because only Alice can pass her own encryption key, as a parameter. Bob acts like a Provider here, and Alice is the Consumer — but the roles can be also reversed. Upon the function call, Alice gets the answer: “Bob is richer”.

use role bob;
use warehouse compute_wh;
use database bob_db;
create or replace schema bob_db.data;create table bob_table as
select 1250000 as wealth;
-- this returns Alice's wealth, but always encrypted
select * from alice_db.data.alice_view;
create secure function whos_richer(alice_key string)
returns string
as 'select case
when b.wealth > to_number(to_varchar(
decrypt(a.wealth, alice_key), ''utf-8''))
then ''Bob is richer''
else ''Alice is richer'' end
from bob_table b, alice_db.data.alice_view a';
-- this will fail
select whos_richer('i dunno pw');
-- but Alice will be able to use it
-- let's grant her access to call this function
grant usage on schema bob_db.data to role alice;
grant usage on function whos_richer(string) to role alice;

-- this will return 'Bob is richer'
use role alice;
select bob_db.data.whos_richer('alice secret');

How Alice COULD Find Out How Much Bob is Worth

The first problem with this scenario is that Alice can call this function as many times as she wants. And find out in just a few iterations how much Bob is worth using a basic Binary Search algorithm. Bob’s function should be adapted to prevent this, with added throttling or something else…

The following JavaScript stored procedure takes only a few seconds to return “Bob is worth around 1250000”, with an approximation of only $1,000! We encapsulated an inner function that updates Alice’s wealth value just to determine, after a few iterations, where Bob’s figure is. We start by determining a lower and upper range, then we slice this interval repeatedly until the range interval is less than 1000.

use role alice;
use warehouse compute_wh;
use schema alice_db.data;
create or replace procedure find_how_much_is_bob_worth()
returns string
language javascript
as $$
function set_and_call(wealth) {
// change Alice's wealth to some value
snowflake.execute({
sqlText: `update alice_table set wealth = ?`,
binds: [wealth]}).next();
// find out if Bob is (still) worth more
let secret = 'alice secret';
var rs = snowflake.execute({
sqlText: `select bob_db.data.whos_richer(?)`,
binds: [secret]});
rs.next();
return rs.getColumnValue(1);
};

// do until we find Bob's wealth is between some range interval
var lower, upper = 500000, s = 'Bob is richer';
while (s == 'Bob is richer') {
lower = upper;
upper = 2 * lower;
s = set_and_call(upper);
}
// apply binary search within the interval
var mid = (lower + upper) / 2;
while (upper - lower > 1000) {
if (set_and_call(mid) == 'Bob is richer')
lower = mid;
else
upper = mid;
mid = (lower + upper) / 2;
}
mid = Math.round(mid / 1000) * 1000;
return `Bob is worth around ${mid}`;
$$;
-- this will return 'Bob is worth around 1250000'
call find_how_much_is_bob_worth();

How Bob COULD Find Out How Much Alice is Worth

The problem with some “secure” views or UTFs in Snowflake is that they may hide the view or function definition from the wrong person. Alice, in this case, cannot see the definition of the secure function that Bob created. To eventually find out the name of the table where Bob keeps his wealth number. So it looks safe for Bob.

But Bob could recreate the function anytime and use some hack to save the decrypted value on Alice’s call. It is not easy to do this today, because unlike stored procedures UDFs have some serious limitations. But there is no guarantee this will remain the case. I don’t remember any guarantee in Snowflake’s documentation that UDFs can never have side effects.

If you were Alice, would you trust passing your private secret or key to someone else’s function that you don’t see what it does?

Conclusions

At Infostrux, we started looking at Data Clean Rooms in Snowflake as future implementation for a big and important customer. There are clearly many good features in Snowflake and important use cases the client has that we will implement.

Data Clean Rooms involve much more than secure views, secure functions, and roles. They involve secure direct data shares and private data exchange. The next phase here, for a full Data Clean Room implementation, would be to separate Alice and Bob into different Snowflake accounts, connected by direct shares. Then ask Snowflake to prepare for us a room using Data Exchange.

Look out for future posts from Infostrux Solutions, a Pure-Play Snowflake Partner, discussing other interesting aspects of the Snowflake Data Cloud. Check our Services here.

--

--

--

Infostrux is a Snowflake Services Partner, providing data cloud infrastructure for data-driven organizations.

Recommended from Medium

What I learned about the Big O Notation

What we discovered from upgrading Rails 4.2.x to Rails 5.0.x

Inset box-shadow rendering bug on Safari iOS 13.4

On iOS 13.4, a black vertical bar appears

Null safety: Easier in Scala than in Java

The Value Of Writing Pseudocode

## Internal Error (500) incident Postmotern

My Coding Bootcamp Survival Guide

MATLAB: My learning journey and some tips!!!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cristian Scutaru

Cristian Scutaru

Lead Data Architect, a seasoned expert in Snowflake Data Cloud. Recognized by Snowflake as their only ”Data Superhero” from Canada in 2022.

More from Medium

What Do I Do When My Snowflake Query Is Slow? Part 2: Solutions

Slow Snowflake Queries: Solutions — Rockset

How to Reverse Engineer a Snowflake Database Schema

Data Vault Loader Traps

Optimize Your Hierarchical Data with Snowflake, Part Two

Marshmellow Spaghetti Tower