Source: Pixabay

Dynamic Data Masks: Protect sensitive data with Snowflake (superhero edition)

Your customers trust you to protect their privacy and identities — and Snowflake makes this easy with Dynamic Data Masks. With this you can get rid of complex systems of secure views or copies of data: Just create policies that allow everyone to share the same tables and queries, while protecting sensitive data.

Watch on Youtube #SnowflakeBytes

Let’s say your customers are superheroes, and you have a table that contains data about them. You know a lot about your customers, and they trust you to protect their identities and privacy — not only from third parties, but also from people in your company that need this data to do their job, but not all of this:

For example, your lawyers need to see the real names of your heroes, but not their superhero identities. And customer service — they need to see the first name to talk to them, but not their full identity. Maybe they need to know how famous each of them is, but not necessarily the exact number. And so on.

So we need to solve this data problem, and we don’t want to make multiple copies of the data and we don’t want to create a complex system of views.

What we want here is dynamic masks. With dynamic masks, we can define policies in one place and Snowflake will make sure that people in your company can share tables and queries, but they will only see what they’re allowed to see, depending on their role.

So I wrote this policy in SQL that only allows people with the lawyer role to see the full name. Meanwhile people with the customer service role only see the first name — and no one else will be able to see the legal name of our customers:

Let’s try it out with a SELECT *: I get some data, and if I change my role, the same SELECT * gives me different results.

It’s the same table. It’s the same query. My role determines what I see. Even better, these masks are applied at query time. They even work with external tables that might live in your Data Lake.

And it’s that simple because Snowflake cares about your data and the privacy of your customers. And this is just a start. We will go deeper in the near future but in the meantime, check below the full SQL script to reproduce these results.

Script

For dynamic masks to work, you need to use Snowflake’s Enterprise edition or higher:

use role sysadmin;// Let's assume this database and a warehouse exists, otherwise create them.
use masks_db;
// Quick creation of base data
create or replace table customers
as
select $1[0]::string hero_name, $1[1]::string birth_name, replace($1[2], ',')::int fame, $1[3]::string birth_place
from (
select split(x.value, '|')
from table(split_to_table('Batman|Bruce Wayne|104,302|Gotham City
Iron Man|Tony Stark|64,098|Manhattan
Wolverine|James Howlett|43,482|Alberta
Black Widow|Natasha Romanoff|42,453|Volgograd
Catwoman|Selina Kyle|40,763|Gotham City
Magneto|Max Eisenhardt|35,691|Dusseldorf
Doctor Manhattan|Jonathan Osterman|30,225|Heidelberg
Kitty Pryde|Katherine Anne Pryde|28,226|Deerfield
Spawn|Albert Francis Simmons|28,080|Detroit
Daredevil|Matthew Michael Murdock|27,574|New York City
Martian Manhunter|J\’onn J\‘onzz|27,547|Mars
Luke Cage|Carl Lucas|24,799|New York City
Storm|Ororo Munroe|24,221|New York City
Wolfsbane|Rahne Sinclair|23,413|Ullapool
Colossus|Пётр Николаевич Распутин|22,823|Siberia
Legion|David Charles Haller|22,769|Haifa
Tim Drake|Timothy Jackson Drake|21,957|Gotham City
Sunspot|Roberto da Costa|21,872|Rio de Janeiro
Elasti-Girl|Rita Farr|20,650|
Polaris|Lorna Dane|20,532|San Francisco
Zatanna|Zatanna Zatara|19,782|Gotham City
Emma Frost|Emma Grace Frost|19,427|Boston
Cable|Nathan Christopher Charles Summers|18,894|Westchester County
Nightcrawler|Kurt Wagner|18,886|Bavaria
Spider-Woman|Gwendolyne Maxine Stacy|18,375|Forest Hills
Cyclops|Scott Summers|18,344|Anchorage
Psylocke|Elizabeth Braddock|18,168|Maldon', '\n')) x
)
;
// securityadmin can create roles
use role securityadmin;
create role marketing;
create role cust_service;
create role lawyer;
// our user 'felipe' will be able to use these roles
grant role marketing to user felipe;
grant role cust_service to user felipe;
grant role lawyer to user felipe;
// create a role that we'll use for common permissions
create role masks_demo_role;
grant role masks_demo_role to role marketing;
grant role masks_demo_role to role cust_service;
grant role masks_demo_role to role lawyer;
// back to sysadmin, to grant permissions to tables and create policies
use role sysadmin;
grant usage on database masks_db to masks_demo_role;
grant usage on schema masks_db.public to masks_demo_role;
grant select on table masks_db.public.customers to masks_demo_role;
// creating and applying "empty" policies, making it easier to modify later
create or replace masking policy mask_hero as (val string) returns string -> val;
create or replace masking policy mask_name as (val string) returns string -> val;
create or replace masking policy mask_fame as (val int) returns int -> val;
alter table customers modify column hero_name set masking policy mask_hero;
alter table customers modify column birth_name set masking policy mask_name;
alter table customers modify column fame set masking policy mask_fame;
// these are the real policy implementations, we can play here with their code
alter masking policy mask_hero set body ->
case
when current_role() in ('MARKETING', 'SYSADMIN') then val
else ''
end;
alter masking policy mask_name set body ->
case
when current_role() in ('LAWYER', 'SYSADMIN') then val
when current_role() in ('CUST_SERVICE') then regexp_substr(val, '[^ ]*')
else ''
end;
alter masking policy mask_fame set body ->
case
when current_role() in ('MARKETING', 'SYSADMIN') then val
when current_role() in ('CUST_SERVICE') then pow(10, round(log(10, val)))::int
else -1
end;
// try the different roles and see what results they get
// https://i.imgur.com/q1jGMZe.gif
use role marketing;
use role cust_service;
use role lawyer;
select *
from customers;
// introspection: what policies were created and applied
use role sysadmin;
describe table customers;
show masking policies;

Next steps

How JetBlue Secures and Protects Data Using dbt and Snowflake

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn, and check reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Felipe Hoffa
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.