Snowflake: Security to protect Sensitive Data (GDPR) in Access Layer

Applying GDPR methodology with Snowflake Image generated by AI (at https://pixlr.com/)

On previous articles, we saw how the Snowflake Security Framework SSFW can help us to determine what would be the different components that we have to take into consideration to apply security on our Data Platform.

But on this post, we are going to see a practical example how to apply security to a near real case to our Sensitive Data in the visualization part of our Data Architecture. We are going to use some Snowflake Security best practices applying different security features explained on the SSFW - Access Layer, in order to align the different GDPR requirements, like Masking policies based on Tags and Row Access policies.

SSFW — Snowflake Security Framework — Access Layer

We will implement these security capabilities based on the below case:

  • We will have to manage sensitive data exposure to the different departments, in order to provide visualization access to the BI tool enterprise or other third party services that access to the data
  • It will be implemented a Role Based Access Control in a hierarchical mode aligned with the different enterprise organizations levels.
  • Each department will only be able to access to its data, but the on-top hierarchical managment departments will be able to see the underlined department information.
  • The data it will be based on PII data and no PII data. In complex scenarios, you could implement specific categories of PII data

Our Scenario

We are going to describe the different steps to be followed to achieve a managed End-2-End sensitive data access.

We are going to cover on the Access Layer the below security components, on that sequence order:

1- Locate/Generate table to securize.

2- Role Management:

  • Designing the RBAC
  • Starting generating the role hierarchy

3- Authorization Objects: Applying privileges

4- Masking Policies:

  • Generate a Database for policies
  • Generate TAG and Masking Policies
  • Apply the MP to the TAG
  • Checking the list of MP applied
  • Checking results of the Dynamic Data Masking applied.

5- Access Policies:

  • Generate Mapping table
  • Generate Row Access Policy
  • Apply the Row Access Policy to CUSTOMER table.
  • Checking results of the Row Access policy applied.

We will assume the below premises:

  • You have already provided access to the Network Layer to the users
  • On the Access Layer — Role Management, you will have the USERs provisioned / created, so you will have to apply to the different users the roles generated here. It won’t be covered Authentication Management, Sessions and Secured Shared.
  • The Encryption Layer will not be covered on this post. Coming soon.

1) Locate/Generate table to securize

  • We are going to generate our dummy CUSTOMER Data Table, this is not part of the security components.
  • We will assume that database and schema are already generated.
  • In a real case, this table would be already generated.

We are going to import some data from the SNOWFLAKE_SAMPLE_DATA share database.

It’s important to import data from different segments of data, that we will need on the future sections to apply access policies to the information. Due to this, we will load dummy information from each Nation Department data.

The table will be stored in the database and schema where we place the information that can be consumed by users/applications.

So, it will look like this:

USE ROLE SYSADMIN;


CREATE OR REPLACE TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE
AS
SELECT TOP 5 *,CURRENT_DATE()::TIMESTAMP_LTZ DT_LOAD_REFERENCE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER WHERE C_NATIONKEY =1
UNION
SELECT TOP 5 *,CURRENT_DATE()::TIMESTAMP_LTZ DT_LOAD_REFERENCE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER WHERE C_NATIONKEY =21
UNION
SELECT TOP 5 *,CURRENT_DATE()::TIMESTAMP_LTZ DT_LOAD_REFERENCE FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER WHERE C_NATIONKEY =22;

2) Role Management

2.1 Designing the RBAC

At this moment, we have to know about what is the approach in order to expose specific data to determined stakeholders. So for our case, we will use the below diagram:

The nomenclator of the roles generated used above is based on the below:

XX_LLLL_YY_PPPP

  • XX: FR — Functional Role / AR — Access Role
  • LLLL: Level of the functional hierarchy. Here we will have 3 levels.
  • YY: Role with access to the area information YY
  • PPPP: Indicates if the role has visibility to the PII information. It will be there NO_PII_USER and PII_USER

We can clearly see how we diferentiate between Functional roles and Access Roles. The Funtional roless will be specified in a hierarchical mode, so finally it will be applied some specific AccessRoles to them.

In our use case, we will provide access to our dummy CUSTOMER_TABLE, to the different roles.

All these commands will be executed by USERADMIN/SECURITYADMIN role, but depending on the company case, the best recommendation would be to create an specific ROLE (delegated role) who has the appropriate privileges to the different actions.

2.2 Starting generating the MAIN role hierarchy

We are going to create all the roles:

USE ROLE USERADMIN;

-- CREATION OF ALL ACCESS AND FUNCTIONAL ROLES
CREATE ROLE AR_PII_USER;
CREATE ROLE AR_NO_PII_USER;

CREATE ROLE FR_LVL0;

CREATE ROLE FR_LVL1;
CREATE ROLE FR_LVL1_NOPII;

CREATE ROLE FR_LVL2_A1_PII;
CREATE ROLE FR_LVL2_A1_NOPII;

CREATE ROLE FR_LVL2_A2_PII;
CREATE ROLE FR_LVL2_A2_NOPII;

This below part, it will generate our department RBAC hierarchy:

USE ROLE SECURITYADMIN;

--GENERATION OF THE HIERARCHY

GRANT ROLE FR_LVL0 TO ROLE SYSADMIN;

GRANT ROLE FR_LVL1 TO ROLE FR_LVL0;
GRANT ROLE FR_LVL1_NOPII TO ROLE FR_LVL0;

GRANT ROLE FR_LVL2_A1_PII TO ROLE FR_LVL1;
GRANT ROLE FR_LVL2_A1_NOPII TO ROLE FR_LVL1;

GRANT ROLE FR_LVL2_A2_PII TO ROLE FR_LVL1;
GRANT ROLE FR_LVL2_A2_NOPII TO ROLE FR_LVL1;

GRANT ROLE FR_LVL2_A1_NOPII TO ROLE FR_LVL1_NOPII;
GRANT ROLE FR_LVL2_A2_NOPII TO ROLE FR_LVL1_NOPII;

GRANT ROLE AR_PII_USER TO ROLE FR_LVL2_A1_PII;
GRANT ROLE AR_PII_USER TO ROLE FR_LVL2_A2_PII;

GRANT ROLE AR_NO_PII_USER TO ROLE FR_LVL2_A1_NOPII;
GRANT ROLE AR_NO_PII_USER TO ROLE FR_LVL2_A2_NOPII;

3) Authorization Objects

On this part, we will apply all the needed privileges to our securable objects.

USE ROLE SECURITYADMIN;

--WE ASSIGN WAREHOUSE ONLY TO THE ACCESS ROLES USERS:

GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE AR_PII_USER;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE AR_NO_PII_USER;

--USAGE ON DATABASES AND SCHEMAS
GRANT USAGE ON DATABASE EDW TO ROLE AR_PII_USER;
GRANT USAGE ON DATABASE EDW TO ROLE AR_NO_PII_USER;
GRANT USAGE ON SCHEMA EDW.DC_DATA_IM TO ROLE AR_PII_USER;
GRANT USAGE ON SCHEMA EDW.DC_DATA_IM TO ROLE AR_NO_PII_USER;

--SELECT ON TABLE CUSTOMER_TABLE
GRANT SELECT ON TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE TO ROLE AR_PII_USER;
GRANT SELECT ON TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE TO ROLE AR_NO_PII_USER;

4) Masking policies

On this part, we will cover all the masking policies management we will have to take into consideration.

Important: We will see along the different sections 2 (create some new role according to the POLICIES functionality) and 3 (applying the privileges to the differents roles and functionalities).

4.1 Generate a Database and POLICY admin for policies

Taking into consideration that the policies are schema objects, it’s a good practice to centralize all the policies into an only Database/Schema. This scenario try to avoid the below case: Imagine the case you would have to use one policy across the different databases/schemas into your company:

  • You shouldn’t have N policies the same one for each database schema with the same logic. Some change on that policy, it will require N changes on all your database.
  • If you have stored your policies in one database/schema together with the data, if the database is dropped it will affect to the rest of the masking information that uses that policies.
  • It doesn’t scale in managable terms and it has a risk that some policies stay not aligned with the rest ones.
USE ROLE SYSADMIN;

--WE CREATE SCHEMA WHERE WE WANT TO STORE DATAGOVERNANCE POLICIES
CREATE DATABASE DATAGOVERNANCE;
CREATE SCHEMA DATAGOVERNANCE.POLICIES;

USE ROLE SECURITYADMIN;

--WE WILL CREATE A POLICY ADMINISTRATOR

create ROLE POLICYADMIN;

--GRANT USAGE ON DATABASE SCHEMA EDW.DC_DATA_IM
GRANT USAGE ON DATABASE DATAGOVERNANCE TO ROLE POLICYADMIN;
GRANT USAGE ON SCHEMA DATAGOVERNANCE.POLICIES TO ROLE POLICYADMIN;

--GRANT USAGE ON DATABASE SCHEMA EDW.DC_DATA_IM
GRANT USAGE ON DATABASE EDW TO ROLE POLICYADMIN;
GRANT USAGE ON SCHEMA EDW.DC_DATA_IM TO ROLE POLICYADMIN;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE POLICYADMIN;

-- grant CREATE AND APPLY masking policy privileges to the POLICYADMIN custom role.
GRANT CREATE MASKING POLICY ON SCHEMA DATAGOVERNANCE.POLICIES TO ROLE POLICYADMIN;

USE ROLE ACCOUNTADMIN;

GRANT APPLY MASKING POLICY ON ACCOUNT TO ROLE POLICYADMIN;

-- grant APPLY TAGS to POLICYADMIN on ACCOUNT
GRANT APPLY TAG ON ACCOUNT TO ROLE POLICYADMIN;

-- put POLICYADMIN on hierarchy under SYSADMIN
GRANT ROLE POLICYADMIN TO ROLE SYSADMIN;

4.2 Generate TAG and Masking Policies

We will generate an only one tag for all the masking policies one for each type. Remember that a masking policy only can return the value of the same input format value (the column format one).

USE ROLE SYSADMIN;

-- We create a only one tag
CREATE TAG DATAGOVERNANCE.POLICIES.pii_fields; --ONE ONLY, but different value tags

USE ROLE POLICYADMIN;

--WE CREATE A MASKING POLICIES, ONE FOR EACH DATA TYPE

CREATE MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy AS (val STRING) RETURNS STRING ->
CASE
WHEN IS_GRANTED_TO_INVOKER_ROLE('AR_PII_USER') THEN val
ELSE '*******'
END;

CREATE MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy_num AS (val NUMERIC) returns NUMERIC ->
CASE
WHEN IS_GRANTED_TO_INVOKER_ROLE('AR_PII_USER') THEN val
ELSE -1::NUMERIC
END;

CREATE MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy_dt AS (val TIMESTAMP_LTZ) returns TIMESTAMP_LTZ ->
CASE
WHEN IS_GRANTED_TO_INVOKER_ROLE('AR_PII_USER') THEN val
ELSE NULL::TIMESTAMP_LTZ
END;

4.3 Apply the MP to the TAG

Our intention would be to overide the same tag with all the masking policies. Snowflake it will detect what masking policy to use, on each case, checking their format!

USE ROLE POLICYADMIN;

--WE APPLY THE DIFFERENT MASKING POLICIES ON THE TAG
ALTER TAG DATAGOVERNANCE.POLICIES.pii_fields SET
MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy;
ALTER TAG pii_fields SET
MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy_dt;
ALTER TAG pii_fields SET
MASKING POLICY DATAGOVERNANCE.POLICIES.masking_policy_num;

--WE APPLY TO THE DIFFERENT SENSITIVE DATA THE TAG
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN C_CUSTKEY
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'IDENTIFIER';
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN C_ACCTBAL
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'SENSITIVE_FINANCE';
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN C_NAME
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'PERSON_NAME';
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN C_ADDRESS
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'ADDRESS';
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN C_PHONE
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'PHONE_NUMBER';
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE MODIFY COLUMN DT_LOAD_REFERENCE
SET TAG DATAGOVERNANCE.POLICIES.pii_fields = 'DATE_FINANCE';

4.4 Checking the list of MP applied

We are going to check, if there are some missing masking policies.

--WE CHECK WHAT TAGS HAVE BEEN APPLIED ON WHAT COLUMNS
SELECT *
FROM TABLE(
EDW.INFORMATION_SCHEMA.TAG_REFERENCES_ALL_COLUMNS(
'EDW.DC_DATA_IM.CUSTOMER_TABLE',
'table'
));
--WE CHECK WHAT POLICIES HAVE BEEN APPLIED ON WHAT COLUMNS 
-- THROUGH THE DIFFERENT TAGS
SELECT *
FROM TABLE(
EDW.INFORMATION_SCHEMA.POLICY_REFERENCES(
REF_ENTITY_NAME => 'EDW.DC_DATA_IM.CUSTOMER_TABLE',
REF_ENTITY_DOMAIN => 'table'
)
);

4.4 Checking the results of the Dynamic Data Masking applied

Now, we are going to check with the different Functional roles, that we can see the masked and/or unmasked data information.

-- WE WILL CHECK with NO PII role, THAT PII DATA IS HIDDEN
USE ROLE FR_LVL2_A2_NOPII;
USE WAREHOUSE COMPUTE_WH;
SELECT * FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;
-- WE WILL CHECK with PII role, THAT PII DATA IS SHOWED
USE ROLE FR_LVL2_A2_PII;
USE WAREHOUSE COMPUTE_WH;
SELECT * FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

We will see that we can see all the fields information, but all the rows including the nation departments that this FR Level2 SHOULDN’T see! So we will apply the new security features on next section!

5) ACCESS POLICIES

5.1 Generate Mapping Table

This table it will be useful in order to provide an only place to manage the access to specific information. Being more flexible and scalable.


USE ROLE SYSADMIN;

--DROP TABLE DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE;
CREATE OR REPLACE TABLE DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE (
account_role varchar,
nation_key int
);


INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1',1);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1',21);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1',22);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1_NOPII',1);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1_NOPII',21);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL1_NOPII',22);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL2_A1_PII',21);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL2_A1_NOPII',21);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL2_A2_PII',22);
INSERT INTO DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
VALUES('FR_LVL2_A2_NOPII',22);

USE ROLE SECURITYADMIN;

GRANT SELECT ON TABLE DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
TO ROLE POLICYADMIN;

USE ROLE POLICYADMIN;
SELECT * FROM DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE ORDER BY 1;

5.2 Generate Row Access Policy using Mapping Table

USE ROLE SECURITYADMIN;
--GRANT CREATE ROW ACCESS POLICY
GRANT CREATE ROW ACCESS POLICY ON SCHEMA DATAGOVERNANCE.POLICIES
TO ROLE POLICYADMIN;
USE ROLE ACCOUNTADMIN;
--GRANT APPLY ROW ACCESS POLICY
GRANT APPLY ROW ACCESS POLICY ON ACCOUNT TO ROLE POLICYADMIN;

USE ROLE POLICYADMIN;
--WE CREATE ROW ACCESS POLICIES USING THE MAPPING TABLE
create or replace row access policy
DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE as (id_nation_key int)
returns boolean ->
'FR_LVL0' = current_role()
or exists (
select 1 from DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
where ACCOUNT_ROLE = current_role()
and NATION_KEY = id_nation_key
);

5.3 Apply the ROW ACCESS POLICY to the table

--WE APPLY THE DIFFERENT ROW ACCESS POLICIES ON THE TABLE
ALTER TABLE EDW.DC_DATA_IM.CUSTOMER_TABLE
ADD ROW ACCESS POLICY DATAGOVERNANCE.POLICIES.ACCOUNT_NATION_ROLE
ON (C_NATIONKEY);

5.4 CHECK ACCESIBILITY to CUSTOMER table

We are going to check, with the different roles that users will use, what type of visualization access will have each one of them.

USE ROLE ACCOUNTADMIN;
SELECT current_role() ROLE_USED,* FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

With the role ACCOUNTADMIN we can’t see the data! A good point!

USE ROLE FR_LVL0;
SELECT current_role() ROLE_USED,* FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

With the role FR_LVL0 we can see the data, and all Nation departments! A good point!

  USE ROLE FR_LVL1_NOPII;
SELECT current_role() ROLE_USED,* FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

With the role FR_LVL1 with no PII access, we can’t see the PII data, but all Nation departments under LVL 1! A good point!

USE ROLE FR_LVL2_A2_PII;
SELECT current_role() ROLE_USED,* FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

With the role FR_LVL21 with PII access, we can see the PII data, but ONLY its 21 Nation department! A good point!

USE ROLE FR_LVL2_A2_NOPII;
SELECT current_role() ROLE_USED,* FROM EDW.DC_DATA_IM.CUSTOMER_TABLE;

With the role FR_LVL21 with no PII access, we can’t see the PII data, but ONLY its 21 Nation department! A good point!

So finally, we could check all the roles, but if you do by your own you will see that all is in its place. :)

Conclusions

We have seen that you can apply differents access security methods in order to restrict the access to the data to different roles (users). The most important here, and my recommenadtion, would be think about your Security Strategy design (you can follow an overview on the Snowflake security here).

On this article we have covered only some components of the Access Layer. And for example, we have used some masking and access policies, but I encourage you in order to check the rest of ones: Projection policies (Access type) that hide the columns or Aggregation policies (Masking type) that provides partial results using aggregation functions. You can check the details of the Access Layer here.

About me

Subject Matter Expert on different Data Technologies, with 20+ years of experience in Data Adventures. Snowflake Squad Team Founder and experienced Snowflake Data Jedi&Saiyan. As a Data Vault Certified Practitioner, I have been leading Data Vault Architecures using Metadata Driven methodologies.

If you want to know more in detail about the aspects seen here, or other ones, you can follow me on medium || Linked-in here.

I hope you have joined and this can help you!

--

--

Cesar Segura
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

SME @ SDG Group || Snowflake Architect || Snowflake Squad Spotlight Member || CDVP Data Vault