A step-by-step tutorial to Row and Column Access policies in Snowflake

Snowflake, as a Data Warehouse, offers many options to share the databases, tables, and views with users, and you might want to control what to show or not when your data is sensitive. Let’s look at how we can set up data masking on columns and rows.

Arthur Telders
Roquette Data & Adv. Analytics
12 min readFeb 27, 2023

--

Photo by Liam O’Prey @liamlumiere

This article is freely inspired by the one from Pierre-Louis Bescond (my manager at Roquette) who described the same mechanisms applied with BigQuery and encouraged me to reproduce it with Snowflake.

Context

A Human Resources table contains information about its employees in several columns. Some of them are sensitive, for example, the salaries.
There are different sensitivities among the columns and rows, and we want to give different access levels to three types of employees.

Image by Pierre-Louis Bescond

The three roles are:

  • Human resources: they should have unrestricted access to all data columns. However, access to rows should be restricted so that HR representatives can only view information related to their respective country.
  • Data Scientists: they are allowed to access only certain selected and anonymized columns for modeling purposes.
  • Data Citizens: they should have limited access to columns and should only be permitted to create high-level queries and perform basic analysis.

We will configure column access with masking policies and tags, as well as row access using row access policies.

We will apply these policies to our table. Note that these rules will remain valid when the table is cloned or shared, because the governance in Snowflake is global and uniform across the platform.

Also, governance extends beyond dynamic data masking and row access policies. Other features are not covered in this article, such as classification, object dependencies, access history, and anonymization. To learn how to make the most of the available options, I suggest referring to the documentation of Snowflake.

Create a Snowflake trial account

Feel free to skip this part if you already have a Snowflake account.

The first step is to open a trial account in Snowflake, choosing at least the Enterprise edition, as some of the features we will use in this tutorial aren’t available in the Standard edition.

You can choose any cloud provider, for the demo, but, in an enterprise context, you might want to keep the same cloud provider you already use. Follow the links below for more information about this.

Supported Cloud Platforms — Snowflake Documentation

Supported Cloud Regions — Snowflake Documentation

As soon as you have subscribed to the trial account, you will receive a welcome email.

Log in to Snowflake — Image from Author (screenshot)

The account URL is composed like this:

https://<organization name>-<account>.snowflakecomputing.com

Note: you might have one organization for your business entity, grouping multiple accounts in different regions.

Once you log in with your username and user password, you’ll land on another URL that is composed like this:

https://app.snowflake.com/<region>.<cloud-provider>/<locator>/

Load a table from a CSV file through SnowSQL

For this first step, we need to install SnowSQL.

To do this, follow the instructions:

Installing SnowSQL — Snowflake Documentation

We also need the data, which is available from Pierre-Louis Bescond’s Google Drive.

Once you’ve installed SnowSQL, let’s run it and upload the table.

In the terminal, type:

snowsql -a <organization name>-<account> -u <username>

Then type your password.

snowsql -a hswhmsm-hv09133 -u arthurtelders
>>Password:
>>* SnowSQL * v1.2.24
>>Type SQL statements or !help

Now you can type SQL statements.

The statements below tell Snowflake to:

  • use the default warehouse,
arthurtelders#(no warehouse)@(no database).(no schema)>use warehouse COMPUTE_WH;

+----------------------------------+
| status |
|----------------------------------|
| Statement executed successfully. |
+----------------------------------+
1 Row(s) produced. Time Elapsed: 0.333s
  • create the database EMPLOYEES,
arthurtelders#COMPUTE_WH@(no database).(no schema)>create database EMPLOYEES;

+------------------------------------------+
| status |
|------------------------------------------|
| Database EMPLOYEES successfully created. |
+------------------------------------------+
1 Row(s) produced. Time Elapsed: 0.341s
  • create the table SALARIES with its definition,
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>create or replace table SALARIES(
First_Name string,
Last_Name string,
Department string,
Country string,
email string,
Salary integer,
Currency string,
Salary_Positioning integer,
last_perf integer,
seniority integer,
commute_time integer);

+--------------------------------------+
| status |
|--------------------------------------|
| Table SALARIES successfully created. |
+--------------------------------------+
1 Row(s) produced. Time Elapsed: 0.387s
  • put the file in an internal stage and zip it,
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>put file:///home/azureuser/dev/snowflake_tuto/fake_employees_data/salaries.csv @employees.public.%salaries;

+--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------+
| source | target | source_size | target_size | source_compression | target_compression | status | message |
|--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------|
| salaries.csv | salaries.csv.gz | 647077 | 136496 | NONE | GZIP | UPLOADED | |
+--------------+-----------------+-------------+-------------+--------------------+--------------------+----------+---------+
1 Row(s) produced. Time Elapsed: 1.626s
  • copy the contents of the internal stage in the table.
arthurtelders#COMPUTE_WH@EMPLOYEES.PUBLIC>copy into salaries from @%salaries
file_format = (
type = csv
field_delimiter = ','
skip_header = 1
)
on_error = 'skip_file';

+-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
| file | status | rows_parsed | rows_loaded | error_limit | errors_seen | first_error | first_error_line | first_error_character | first_error_column_name |
|-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------|
| salaries.csv.gz | LOADED | 8600 | 8600 | 1 | 0 | NULL | NULL | NULL | NULL |
+-----------------+--------+-------------+-------------+-------------+-------------+-------------+------------------+-----------------------+-------------------------+
1 Row(s) produced. Time Elapsed: 1.627s

Create the roles and the resources for each role

Now, in the Snowflake UI, let’s open a worksheet and we will create the roles and the resources needed for each of the 3 roles “HUMANRESOURCES”, “DATASCIENTIST” and “DATACITIZEN”.

We also create a warehouse for each role.

Each warehouse is configured using the smallest size (Snowflake warehouses deliver the computing power, and the sizes go from X-SMALL to 4XLARGE, and their computing power and prices double each time from 1 credit per hour to 128 credits per hour).

The billing is per second, after the first minute, and I’ve set the warehouses to suspend after one minute.

Finally, we create a user of the Human resources from the UK.

-- Initialization

use role ACCOUNTADMIN;
use warehouse COMPUTE_WH;

-- Create 3 different roles

create or replace role "HUMANRESOURCES";
create or replace role "DATASCIENTIST";
create or replace role "DATACITIZEN";

-- Create one warehouse for each of the 3 roles

create or replace warehouse HR_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;

create or replace warehouse DS_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;

create or replace warehouse DC_XSMALL_WH with
warehouse_size = 'X-SMALL'
auto_suspend = 60
auto_resume = true
initially_suspended = true;

-- Create a user of the Human resources based in the UK

create user HUMANRESOURCES_USER_UK PASSWORD = '******'
DEFAULT_WAREHOUSE = 'HR_XSMALL_WH'
default_namespace = 'employees'
must_change_password = true;

Column policies

Masking policies

Now let’s create masking policies. Each making policy is assigned to a data type (integer, string…).

The SQL syntax to create a masking policy is composed like this:

create masking policy <policy name> as (val <input datatype>) returns <output datatype> ->
case
when current_role() in (<List of authorized roles>) then val
else <output value if not authorized>
end;

For the different levels of sensitivity, we’ll create those masking policies:

  • Critical Information (High): integers are converted to 0, and strings are converted to ‘******’ if the current role doesn’t belong to HUMANRESOURCES.
use database employees;

-- Masking policy for Critical information (High) -> integer returns O and string returns ******

create masking policy hr_sensitive_high_integer_mask as (val integer) returns integer ->
case
when current_role() in ('HUMANRESOURCES') then val
else 0
end;

create masking policy hr_sensitive_high_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else '******'
end;
  • Restricted Information (Medium): the same as above but if the role doesn’t belong to HUMANRESOURCES or DATASCIENTIST.
-- Masking policy for Restricted information (Medium) -> integer returns O and string returns ******

create masking policy hr_sensitive_medium_integer_mask as (val integer) returns integer ->
case
when current_role() in ('HUMANRESOURCES','DATASCIENTIST') then val
else 0
end;

create masking policy hr_sensitive_medium_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES', 'DATASCIENTIST') then val
else '******'
end;
  • Identification Information (Names): strings are converted to ‘******’ if the current role doesn’t belong to HUMANRESOURCES.
-- Masking policy for Identification information (Names) -> string returns ******

create masking policy hr_sensitive_identification_name_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else '******'
end;
  • Identification Information (Email): strings are hashed with SHA256 if the current role doesn’t belong to HUMANRESOURCES.
-- Masking policy for Identification information (Emails) -> string returns hashed value

create masking policy hr_sensitive_identification_email_string_mask as (val string) returns string ->
case
when current_role() in ('HUMANRESOURCES') then val
else sha2(val)
end;

And verify the masking policies were created.

show masking policies;
Result of query “show masking policies“ — Image from Author (screenshot)

It is worth noting that masking values can be hardcoded such as ‘******’, but they can be more sophisticated by using regular expressions, user-defined functions (UDFs), or external functions. These options provide greater flexibility in implementing custom masking logic tailored to specific use cases.

Tags

Tags allow us to group multiple masking policies and can be assigned to multiple columns.

For each level of sensitivity, we create a tag, to which we’ll set the corresponding policies:

-- Tag for Critical information (High)
create or replace tag hr_sensitive_high;
alter tag hr_sensitive_high set masking policy HR_SENSITIVE_HIGH_INTEGER_MASK;
alter tag hr_sensitive_high set masking policy HR_SENSITIVE_HIGH_STRING_MASK;

-- Tag for Restricted information (Medium)
create or replace tag hr_sensitive_medium;
alter tag hr_sensitive_medium set masking policy HR_SENSITIVE_MEDIUM_INTEGER_MASK;
alter tag hr_sensitive_medium set masking policy HR_SENSITIVE_MEDIUM_STRING_MASK;

-- Tag for Identification information (Names)
create or replace tag hr_sensitive_identification_name;
alter tag hr_sensitive_identification_name set masking policy HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;

-- Tag for Identification information (Emails)
create or replace tag hr_sensitive_identification_email;
alter tag hr_sensitive_identification_email set masking policy HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;

Assign the tags to the columns

Finally, we modify the columns, assigning them the corresponding tag.

-- table
alter table if exists salaries modify column First_Name set tag hr_sensitive_identification_name = 'tag-based policies';
alter table if exists salaries modify column Last_Name set tag hr_sensitive_identification_name = 'tag-based policies';
alter table if exists salaries modify column email set tag hr_sensitive_identification_email = 'tag-based policies';
alter table if exists salaries modify column Salary set tag hr_sensitive_high = 'tag-based policies';
alter table if exists salaries modify column Salary_Positioning set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column last_perf set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column seniority set tag hr_sensitive_medium = 'tag-based policies';
alter table if exists salaries modify column commute_time set tag hr_sensitive_medium = 'tag-based policies';

Row policies

Now, let’s set up a row policy to put a filter on the Country.

We want to give access only to the information about the employees of the UK to our UK-based user from Human Resources.

We set a condition when the current user is our user “HUMANRESOURCES_USER_UK”, and the value is “UK”.

We must also add conditions for the roles DATASCIENTIST and DATACITIZEN, as we do not want to restrict them on row access.

create or replace row access policy COUNTRY_FILTER as (country string) returns boolean ->
case
when 'HUMANRESOURCES_USER_UK' = current_user() and country = 'UK' then True
when 'DATASCIENTIST' = current_role() then True
when 'DATACITIZEN' = current_role() then True
else False
end;

ALTER table if exists salaries add row access policy COUNTRY_FILTER on (Country);

show row access policies;
Result of query “show row access policies“ — Image from Author (screenshot)

Grant permissions on objects

We will grant usage to the different roles on their warehouse, the database, the schema and the table, and grant the roles to users.

For our example, we grant the roles DATASCIENTIST and DATACITIZEN to me, and we grant the role HUMANRESOURCES to our user from the Human resources in UK.

-- DATASCIENTIST

grant usage on warehouse DS_XSMALL_WH to role DATASCIENTIST;
grant usage on database employees to role DATASCIENTIST;
grant usage on schema employees.public to role DATASCIENTIST;
grant select on all tables in schema employees.public to role DATASCIENTIST;

grant role DATASCIENTIST to user ARTHURTELDERS;

-- DATACITIZEN

grant usage on warehouse DC_XSMALL_WH to role DATACITIZEN;
grant usage on database employees to role DATACITIZEN;
grant usage on schema employees.public to role DATACITIZEN;
grant select on all tables in schema employees.public to role DATACITIZEN;

grant role DATACITIZEN to user ARTHURTELDERS;

-- HUMANRESOURCES (UK)

grant usage on warehouse HR_XSMALL_WH to role HUMANRESOURCES;
grant usage on database employees to role HUMANRESOURCES;
grant usage on schema employees.public to role HUMANRESOURCES;
grant select on all tables in schema employees.public to role HUMANRESOURCES;

grant role HUMANRESOURCES to user HUMANRESOURCES_USER_UK;

Verify that we have correctly set up the policies

This query will return all the policies that are assigned to a specific object and their status, in our case, the table “employees.public.salaries”:

select policy_name, ref_column_name, tag_name, policy_status
from table(
employees.information_schema.policy_references(
ref_entity_name => 'employees.public.salaries',
ref_entity_domain => 'table'
)
);
Verify that we have correctly set up the policies — Image from Author (screenshot)

Finally, we will use the different roles to verify that we can only access the data that we are allowed to access with each role.

Data Scientist role

use role DATASCIENTIST;
use warehouse DS_XSMALL_WH;
--

select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME
limit 50;

As expected, the column “salaries” is filled with 0, the names are masked, and emails are hashed, but I still have access to columns Salary_Positioning, last_perf, seniority, and commute_time. So I’ll be able to work with this data to create models, without having access to highly sensitive information or personal information.

Data Citizen role

use role DATACITIZEN;
use warehouse DC_XSMALL_WH;
--

select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME
limit 50;

This role is even more restrictive, so more columns are filled with 0.

Human resources role

We created a different user for this role, so let’s connect with his credentials and change the password!

And run a query on the table.

use role HUMANRESOURCES;

use warehouse HR_XSMALL_WH;

select *
from EMPLOYEES.PUBLIC.SALARIES
order by COUNTRY, LAST_NAME, FIRST_NAME;

As expected, our HR user from the UK can only access the rows of his/her country, but still has access to sensitive information such as salaries.

Conclusion

We’ve seen how to set up column access policies using tags and row access policies. Congratulations if you made it this far!

One last thing, we will now delete the resources we created for the tutorial, and you’ll see that to drop a policy, you need first to unset it from the tag or the column because once they are tied, they are protected, so we cannot drop them!

-- DELETE RESOURCES

use role ACCOUNTADMIN;

-- COLUMNS: unset masking policies from the tags

alter tag if exists hr_sensitive_high unset masking policy HR_SENSITIVE_HIGH_INTEGER_MASK;
alter tag if exists hr_sensitive_high unset masking policy HR_SENSITIVE_HIGH_STRING_MASK;
alter tag if exists hr_sensitive_medium unset masking policy HR_SENSITIVE_MEDIUM_INTEGER_MASK;
alter tag if exists hr_sensitive_medium unset masking policy HR_SENSITIVE_MEDIUM_STRING_MASK;
alter tag if exists hr_sensitive_identification_email unset masking policy HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;
alter tag if exists hr_sensitive_identification_name unset masking policy HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;

-- COLUMNS: drop masking policies

drop masking policy if exists HR_SENSITIVE_HIGH_INTEGER_MASK;
drop masking policy if exists HR_SENSITIVE_HIGH_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_MEDIUM_INTEGER_MASK;
drop masking policy if exists HR_SENSITIVE_MEDIUM_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_IDENTIFICATION_EMAIL_STRING_MASK;
drop masking policy if exists HR_SENSITIVE_IDENTIFICATION_NAME_STRING_MASK;

-- COLUMNS: drop the tags

drop tag if exists hr_sensitive_high;
drop tag if exists hr_sensitive_medium;
drop tag if exists hr_sensitive_identification_name;
drop tag if exists hr_sensitive_identification_email;

-- COLUMNS: check if the masking policies were deleted

show masking policies;

-- ROWS: drop row access policy

alter table if exists salaries drop row access policy COUNTRY_FILTER;

drop row access policy if exists COUNTRY_FILTER;

-- ROWS: check if the row access policy was deleted

show row access policies;

-- Drop roles

drop role if exists "HUMANRESOURCES";
drop role if exists "DATASCIENTIST";
drop role if exists "DATACITIZEN";

-- Drop user

drop user HUMANRESOURCES_USER_UK;

-- Drop warehouses

drop warehouse HR_XSMALL_WH;
drop warehouse DS_XSMALL_WH;
drop warehouse DC_XSMALL_WH;

show warehouses;

I would like to thank my manager Pierre-Louis Bescond for encouraging me to write this article, and for taking the time to review it, as well as Audrey Delou and Sébastien Denche from Snowflake for their help with proofreading and the additional information they provided me.

Thanks to my friend Liam O’Prey for allowing me to use a beautiful picture he took in Iceland.

--

--