Snowflake Tag-Based Masking

A tag based masking policy combines the object tagging and masking policy features to allow a masking policies to be set on a tag.

Object tagging along with data classification offers native capabilities to easily address these use cases by creating metadata tag association with desired objects such as columns, tables, views, databases, warehouses and more.

Each tag created can be assigned a tag value (tag attributes). The following example applies a cost_center tag with ‘finance’ and ‘sales’ tag values to dedicated virtual warehouses. The organisation can then easily break down credit consumption and usage based on department for reporting purposes.

create tag cost_center;
alter warehouse finance_wh set tag cost_center = 'finance';
alter warehouse sales_wh set tag cost_center = 'sales';

The broader use case for object tagging is to apply business context using tags to identify data objects as sensitive or PII. For organisations, knowing where data resides is often the first steps to protecting it with the appropriate access controls.

For protection of data, dynamic data masking policy enables obfuscation of column values with masked value for users who does not have access to the true value of the column.

Let’s take a look to see how you can assign dynamic data masking policies against tags within Snowflake.

use role tag_admin;
use schema DG_DEMO.PUBLIC;
create tag person_name;
create tag person_age;
create tag match_key;
alter table CUSTOMERS_MASK modify column NAME set tag person_name = 'high' ;
alter table CUSTOMERS_MASK modify column AGE set tag person_age = 'low' ;
alter table CUSTOMERS_MASK modify column ZIP set tag match_key = 'medium';

A tag administrator will create tags (person name, person age and match key). Then the tags are associated to columns on a table (CUSTOMERS_MASK).

Once associated, Snowflake system function get_tag will return the tag values set against each column.

select system$get_tag('person_name', 'customers_mask.name', 'column');
select system$get_tag('person_age', 'customers_mask.AGE', 'column');
select system$get_tag('match_key', 'customers_mask.ZIP', 'column');

In addition to this, the following database level query will return all of the tags associated on every column in a table or view.

select * from table (DG_DEMO.information_schema.TAG_REFERENCES_ALL_COLUMNS( 'customers_mask' , 'table' ));

Now, let’s look at how you can protect these column values by creating dynamic data masking policies.

create or replace masking policy mask_name_simple as
(val string) returns string ->
case
when current_role() in ('DBA_DG_DEMO') then val
else '**masked**'
end;

create or replace masking policy mask_age_simple as
(val integer) returns integer ->
case
when current_role() in ('DBA_DG_DEMO') then val
else -999999
end;

create or replace masking policy mask_key_simple as
(val string) returns string ->
case
when current_role() in ('DBA_DG_DEMO') then val
else AGAINST(val)
end;

Masking policies are similar to functions where you can define role base access with masking logic. The above policies will mask column value from users that are not assigned the DBA_DG_DEMO role.

  1. mask string column values and return **masked**
  2. mask integer column values and return -999999
  3. mask a column value and return a hash value (using MD5 function)

Further to this, Snowflake UDF (User Defined Functions) using SQL, Javascript, Java, Scala/Python* can be used to formulate the masked value.

With the masking policies now created, we can now easily apply the masking policies on a tag which will apply the masking policies to all associated column values referenced by the tag.

alter tag person_name set masking policy mask_name_simple;
alter tag person_age set masking policy mask_age_simple;
alter tag match_key set masking policy mask_key_simple;

This feature will allow customers to apply protecting policies at scale without the need of applying them on specific table columns. Multiple masking policies can be set on a tag (only if each policy has a different data type).

Now, I will query the CUSTOMER_MASK table using different roles

USE ROLE DBA_DG_DEMO;
SELECT * FROM CUSTOMERS_MASK limit 10;
CUSTOMER_MASK as DBA
USE ROLE DEV_DG_DEMO;
SELECT * FROM CUSTOMERS_MASK limit 10;
CUSTOMER_MASK as DEV

Using a DEV_DG_DEMO role, the NAME, AGE and ZIP columns are all masked due to the dynamic masking policies set on these columns.

For organisations looking to protect their data from different roles but still want to retain analytical value for these users. Using hash functions or UDF can obfuscate the true value of the column but the hash output can still be used for matching or joins!

--

--