A step-by-step tutorial to Row and Column Access policies in Snowflake
Snowflake, as a Data Warehouse, offers many options to share the databases, tables, and views with users, and you might want to control what to show or not when your data is sensitive. Let’s look at how we can set up data masking on columns and rows.
This article is freely inspired by the one from Pierre-Louis Bescond (my manager at Roquette) who described the same mechanisms applied with BigQuery and encouraged me to reproduce it with Snowflake.
Context
A Human Resources table contains information about its employees in several columns. Some of them are sensitive, for example, the salaries.
There are different sensitivities among the columns and rows, and we want to give different access levels to three types of employees.
The three roles are:
- Human resources: they should have unrestricted access to all data columns. However, access to rows should be restricted so that HR representatives can only view information related to their respective country.
- Data Scientists: they are allowed to access only certain selected and anonymized columns for modeling purposes.
- Data Citizens: they should have limited access to columns and should only be permitted to create high-level queries and perform basic analysis.
We will configure column access with masking policies and tags, as well as row access using row access policies.
We will apply these policies to our table. Note that these rules will remain valid when the table is cloned or shared, because the governance in Snowflake is global and uniform across the platform.
Also, governance extends beyond dynamic data masking and row access policies. Other features are not covered in this article, such as classification, object dependencies, access history, and anonymization. To learn how to make the most of the available options, I suggest referring to the documentation of Snowflake.
Create a Snowflake trial account
Feel free to skip this part if you already have a Snowflake account.
The first step is to open a trial account in Snowflake, choosing at least the Enterprise edition, as some of the features we will use in this tutorial aren’t available in the Standard edition.
You can choose any cloud provider, for the demo, but, in an enterprise context, you might want to keep the same cloud provider you already use. Follow the links below for more information about this.
Supported Cloud Platforms — Snowflake Documentation
Supported Cloud Regions — Snowflake Documentation
As soon as you have subscribed to the trial account, you will receive a welcome email.
The account URL is composed like this:
https://<organization name>-<account>.snowflakecomputing.com
Note: you might have one organization for your business entity, grouping multiple accounts in different regions.
Once you log in with your username and user password, you’ll land on another URL that is composed like this:
https://app.snowflake.com/<region>.<cloud-provider>/<locator>/
Load a table from a CSV file through SnowSQL
For this first step, we need to install SnowSQL.
To do this, follow the instructions:
Installing SnowSQL — Snowflake Documentation
We also need the data, which is available from Pierre-Louis Bescond’s Google Drive.
Once you’ve installed SnowSQL, let’s run it and upload the table.
In the terminal, type:
snowsql -a <organization name>-<account> -u <username>
Then type your password.
snowsql -a hswhmsm-hv09133 -u arthurtelders
>>Password:
>>* SnowSQL * v1.2.24
>>Type SQL statements or !help
Now you can type SQL statements.
The statements below tell Snowflake to:
- use the default warehouse,
arthurtelders#(no warehouse)@(no database).(no schema)>use warehouse COMPUTE_WH;
+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.333s
- create the database EMPLOYEES,
arthurtelders#COMPUTE_WH@(no database).(no schema)>create database EMPLOYEES;
+------------------------------------------+
| status |
|------------------------------------------|
| Database EMPLOYEES successfully created. |
+------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.341s
- create the table SALARIES with its definition,
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>create or replace table SALARIES(
First_Name string,
Last_Name string,
Department string,
Country string,
email string,
Salary integer,
Currency string,
Salary_Positioning integer,
last_perf integer,
seniority integer,
commute_time integer);
+--------------------------------------+
| status |
|--------------------------------------|
| Table SALARIES successfully created. |
+--------------------------------------+
1 Row(s) produced. Time Elapsed: 0.387s
- put the file in an internal stage and zip it,
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>put file:///home/azureuser/dev/snowflake_tuto/fake_employees_data/salaries.csv @employees.public.%salaries;
+--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------|
| salaries.csv | salaries.csv.gz | 647077 | 136496 | NONE | GZIP | UPLOADED | |
+--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.626s
- copy the contents of the internal stage in the table.
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>copy into salaries from @%salaries
file_format = (
type = csv
field_delimiter = ','
skip_header = 1
)
on_error = 'skip_file';
+-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| salaries.csv.gz | LOADED | 8600 | 8600 | 1 | 0 | NULL | NULL | NULL | NULL |
+-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.627s
Create the roles and the resources for each role
Now, in the Snowflake UI, let’s open a worksheet and we will create the roles and the resources needed for each of the 3 roles “HUMANRESOURCES”, “DATASCIENTIST” and “DATACITIZEN”.
We also create a warehouse for each role.
Each warehouse is configured using the smallest size (Snowflake warehouses deliver the computing power, and the sizes go from X-SMALL to 4XLARGE, and their computing power and prices double each time from 1 credit per hour to 128 credits per hour).
The billing is per second, after the first minute, and I’ve set the warehouses to suspend after one minute.
Finally, we create a user of the Human resources from the UK.
-- Initialization
use role ACCOUNTADMIN;
use warehouse COMPUTE_WH;
-- Create 3 different roles
create or replace role "HUMANRESOURCES";
create or replace role "DATASCIENTIST";
create or replace role "DATACITIZEN";
-- Create one warehouse for each of the 3 roles
create or replace warehouse HR_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;
create or replace warehouse DS_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;
create or replace warehouse DC_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;
-- Create a user of the Human resources based in the UK
create user HUMANRESOURCES_USER_UK PASSWORD = '******'
DEFAULT_WAREHOUSE = 'HR_XSMALL_WH'
default_namespace = 'employees'
must_change_password = true;
Column policies
Masking policies
Now let’s create masking policies. Each making policy is assigned to a data type (integer, string…).
The SQL syntax to create a masking policy is composed like this:
create masking policy <policy name> as (val <input datatype>) returns <output datatype> ->
case
when current_role() in (<List of authorized roles>) then val
else <output value if not authorized>
end;
For the different levels of sensitivity, we’ll create those masking policies:
- Critical Information (High): integers are converted to 0, and strings are converted to ‘******’ if the current role doesn’t belong to HUMANRESOURCES.
use database employees;
-- Masking policy for Critical information (High) -> integer returns O and string returns ******
create masking policy hr_sensitive_high_integer_mask as (val integer) returns integer ->
case
when current_role() in ('HUMANRESOURCES') then val
else 0
end;
create masking policy hr_sensitive_high_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else '******'
end;
- Restricted Information (Medium): the same as above but if the role doesn’t belong to HUMANRESOURCES or DATASCIENTIST.
-- Masking policy for Restricted information (Medium) -> integer returns O and string returns ******
create masking policy hr_sensitive_medium_integer_mask as (val integer) returns integer ->
case
when current_role() in ('HUMANRESOURCES','DATASCIENTIST') then val
else 0
end;
create masking policy hr_sensitive_medium_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES', 'DATASCIENTIST') then val
else '******'
end;
- Identification Information (Names): strings are converted to ‘******’ if the current role doesn’t belong to HUMANRESOURCES.
-- Masking policy for Identification information (Names) -> string returns ******
create masking policy hr_sensitive_identification_name_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else '******'
end;
- Identification Information (Email): strings are hashed with SHA256 if the current role doesn’t belong to HUMANRESOURCES.
-- Masking policy for Identification information (Emails) -> string returns hashed value
create masking policy hr_sensitive_identification_email_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else sha2(val)
end;
And verify the masking policies were created.
show masking policies;
It is worth noting that masking values can be hardcoded such as ‘******’, but they can be more sophisticated by using regular expressions, user-defined functions (UDFs), or external functions. These options provide greater flexibility in implementing custom masking logic tailored to specific use cases.
Tags
Tags allow us to group multiple masking policies and can be assigned to multiple columns.
For each level of sensitivity, we create a tag, to which we’ll set the corresponding policies:
-- Tag for Critical information (High)
create or replace tag hr_sensitive_high;
alter tag hr_sensitive_high set masking policy HR_SENSITIVE_HIGH_INTEGER_MASK;
alter tag hr_sensitive_high set masking policy HR_SENSITIVE_HIGH_STRING_MASK;
-- Tag for Restricted information (Medium)
create or replace tag hr_sensitive_medium;
alter tag hr_sensitive_medium set masking policy HR_SENSITIVE_MEDIUM_INTEGER_MASK;
alter tag hr_sensitive_medium set masking policy HR_SENSITIVE_MEDIUM_STRING_MASK;
-- Tag for Identification information (Names)
create or replace tag hr_sensitive_identification_name;
alter tag hr_sensitive_identification_name set masking policy HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;
-- Tag for Identification information (Emails)
create or replace tag hr_sensitive_identification_email;
alter tag hr_sensitive_identification_email set masking policy HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;
Assign the tags to the columns
Finally, we modify the columns, assigning them the corresponding tag.
-- table
alter table if exists salaries modify column First_Name set tag hr_sensitive_identification_name = 'tag-based policies';
alter table if exists salaries modify column Last_Name set tag hr_sensitive_identification_name = 'tag-based policies';
alter table if exists salaries modify column email set tag hr_sensitive_identification_email = 'tag-based policies';
alter table if exists salaries modify column Salary set tag hr_sensitive_high = 'tag-based policies';
alter table if exists salaries modify column Salary_Positioning set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column last_perf set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column seniority set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column commute_time set tag hr_sensitive_medium = 'tag-based policies';
Row policies
Now, let’s set up a row policy to put a filter on the Country.
We want to give access only to the information about the employees of the UK to our UK-based user from Human Resources.
We set a condition when the current user is our user “HUMANRESOURCES_USER_UK”, and the value is “UK”.
We must also add conditions for the roles DATASCIENTIST and DATACITIZEN, as we do not want to restrict them on row access.
create or replace row access policy COUNTRY_FILTER as (country string) returns boolean ->
case
when 'HUMANRESOURCES_USER_UK' = current_user() and country = 'UK' then True
when 'DATASCIENTIST' = current_role() then True
when 'DATACITIZEN' = current_role() then True
else False
end;
ALTER table if exists salaries add row access policy COUNTRY_FILTER on (Country);
show row access policies;
Grant permissions on objects
We will grant usage to the different roles on their warehouse, the database, the schema and the table, and grant the roles to users.
For our example, we grant the roles DATASCIENTIST and DATACITIZEN to me, and we grant the role HUMANRESOURCES to our user from the Human resources in UK.
-- DATASCIENTIST
grant usage on warehouse DS_XSMALL_WH to role DATASCIENTIST;
grant usage on database employees to role DATASCIENTIST;
grant usage on schema employees.public to role DATASCIENTIST;
grant select on all tables in schema employees.public to role DATASCIENTIST;
grant role DATASCIENTIST to user ARTHURTELDERS;
-- DATACITIZEN
grant usage on warehouse DC_XSMALL_WH to role DATACITIZEN;
grant usage on database employees to role DATACITIZEN;
grant usage on schema employees.public to role DATACITIZEN;
grant select on all tables in schema employees.public to role DATACITIZEN;
grant role DATACITIZEN to user ARTHURTELDERS;
-- HUMANRESOURCES (UK)
grant usage on warehouse HR_XSMALL_WH to role HUMANRESOURCES;
grant usage on database employees to role HUMANRESOURCES;
grant usage on schema employees.public to role HUMANRESOURCES;
grant select on all tables in schema employees.public to role HUMANRESOURCES;
grant role HUMANRESOURCES to user HUMANRESOURCES_USER_UK;
Verify that we have correctly set up the policies
This query will return all the policies that are assigned to a specific object and their status, in our case, the table “employees.public.salaries”:
select policy_name, ref_column_name, tag_name, policy_status
from table(
employees.information_schema.policy_references(
ref_entity_name => 'employees.public.salaries',
ref_entity_domain => 'table'
)
);
Finally, we will use the different roles to verify that we can only access the data that we are allowed to access with each role.
Data Scientist role
use role DATASCIENTIST;
use warehouse DS_XSMALL_WH;
--
select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME
limit 50;
As expected, the column “salaries” is filled with 0, the names are masked, and emails are hashed, but I still have access to columns Salary_Positioning, last_perf, seniority, and commute_time. So I’ll be able to work with this data to create models, without having access to highly sensitive information or personal information.
Data Citizen role
use role DATACITIZEN;
use warehouse DC_XSMALL_WH;
--
select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME
limit 50;
This role is even more restrictive, so more columns are filled with 0.
Human resources role
We created a different user for this role, so let’s connect with his credentials and change the password!
And run a query on the table.
use role HUMANRESOURCES;
use warehouse HR_XSMALL_WH;
select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME;
As expected, our HR user from the UK can only access the rows of his/her country, but still has access to sensitive information such as salaries.
Conclusion
We’ve seen how to set up column access policies using tags and row access policies. Congratulations if you made it this far!
One last thing, we will now delete the resources we created for the tutorial, and you’ll see that to drop a policy, you need first to unset it from the tag or the column because once they are tied, they are protected, so we cannot drop them!
-- DELETE RESOURCES
use role ACCOUNTADMIN;
-- COLUMNS: unset masking policies from the tags
alter tag if exists hr_sensitive_high unset masking policy HR_SENSITIVE_HIGH_INTEGER_MASK;
alter tag if exists hr_sensitive_high unset masking policy HR_SENSITIVE_HIGH_STRING_MASK;
alter tag if exists hr_sensitive_medium unset masking policy HR_SENSITIVE_MEDIUM_INTEGER_MASK;
alter tag if exists hr_sensitive_medium unset masking policy HR_SENSITIVE_MEDIUM_STRING_MASK;
alter tag if exists hr_sensitive_identification_email unset masking policy HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;
alter tag if exists hr_sensitive_identification_name unset masking policy HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;
-- COLUMNS: drop masking policies
drop masking policy if exists HR_SENSITIVE_HIGH_INTEGER_MASK;
drop masking policy if exists HR_SENSITIVE_HIGH_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_MEDIUM_INTEGER_MASK;
drop masking policy if exists HR_SENSITIVE_MEDIUM_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;
-- COLUMNS: drop the tags
drop tag if exists hr_sensitive_high;
drop tag if exists hr_sensitive_medium;
drop tag if exists hr_sensitive_identification_name;
drop tag if exists hr_sensitive_identification_email;
-- COLUMNS: check if the masking policies were deleted
show masking policies;
-- ROWS: drop row access policy
alter table if exists salaries drop row access policy COUNTRY_FILTER;
drop row access policy if exists COUNTRY_FILTER;
-- ROWS: check if the row access policy was deleted
show row access policies;
-- Drop roles
drop role if exists "HUMANRESOURCES";
drop role if exists "DATASCIENTIST";
drop role if exists "DATACITIZEN";
-- Drop user
drop user HUMANRESOURCES_USER_UK;
-- Drop warehouses
drop warehouse HR_XSMALL_WH;
drop warehouse DS_XSMALL_WH;
drop warehouse DC_XSMALL_WH;
show warehouses;
I would like to thank my manager Pierre-Louis Bescond for encouraging me to write this article, and for taking the time to review it, as well as Audrey Delou and Sébastien Denche from Snowflake for their help with proofreading and the additional information they provided me.
Thanks to my friend Liam O’Prey for allowing me to use a beautiful picture he took in Iceland.