Exploring Snowflake Data Governance with Power BI

In this post, we’ll delve into Snowflake's data governance features and explore how column and row-level security protections within Snowflake can be leveraged by popular BI platforms such as Power BI. Specifically, we’ll review Snowflake's new Governance UI, object tagging, data classification, dynamic data masking, tag-based masking, and row access policies. Lastly, we’ll test our data access policies via a DirectQuery dataset in Power BI.

Intro

Before I joined Snowflake, I spent the first decade-plus of my career in the Microsoft data ecosystem, primarily as a data warehousing and BI architect for consulting firms. During my time as a consultant, I heard a quote (a maxim actually) that has lived rent-free in my head ever since.

Data should be transformed as far upstream as possible, and as far downstream as necessary.

My fellow Power BI practitioners likely recognize this quote as Roche’s Maxim of Data Transformation, coined by Matthew Roche, PM on the Power BI Customer Advisory Team (CAT).

While Roche’s Maxim focuses on data transformations, a case can be made that it can also apply to data protections, such as column and row-level security. Implementing data governance policies directly in Snowflake ensures that data is protected downstream, both within Power BI and for a myriad of additional organizational use cases. For example, your data will remain protected when consumed by other BI platforms, data science workloads, connected applications, ad-hoc analytics, and more. Snowflake has a rich set of data governance capabilities that aim to help customers in three main categories:

  1. Knowing your data
  2. Protecting your data
  3. Connecting your data to your ecosystem

With Roche’s Maxim in mind, let’s review some of Snowflake’s robust governance features, then put them to the test with Power BI.

NOTE: Many of the SQL scripts used in this post were derived from the Tasty ByteZero to Snowflake — Data Governance Quickstart. This is an excellent hands-on lab to quickly get you up and running with Snowflake’s governance capabilities. Spin up a free trial and test these features for yourself!

Snowflake’s new Governance UI

Earlier this month Snowflake released the public preview of the new Governance interface in Snowsight. The Governance interface provides Snowflake admins with an overview of object tagging and data protections applied to tables and columns within their account. From the Dashboard summary (shown below), admins can drill into detailed reports (Tagged Objects screen) to hone in on specific objects (e.g., databases, schemas), tags, and data access policies.

The new Snowsight Governance Interface!

From the Tagged Objects screen (shown below), users can click on a specific record and Snowsight automatically opens a new tab taking you to that underlying object (e.g., Table Details page or Columns page). This allows data stewards to assign tags and access policies quickly and easily without writing any SQL.

Detailed page accessed by drilling/clicking on one of the dashboard summary cards. Clicking on a record navigates directly to the underlying object where you can apply Snowflake governance features directly in the UI.

In the next few sections, we’ll explore specific Snowflake Governance capabilities like object tagging and data access policies. We’ll also work through an example of tagging and protecting tables and columns in our Snowflake dimensional model that feeds a downstream Power BI DirectQuery dataset.

Data Sensitivity & Access Visibility

Object Tagging
So what is a tag anyway? Tags are Snowflake objects that can be assigned to another Snowflake object (e.g., table, column, virtual warehouse). From a usage perspective, tags can be used by data stewards to track sensitive data, monitor resource consumption, and audit activities, to name a few.

We’ll start our example by creating Snowflake tags to categorize our dimensional model at the table level.

/*
----------------------------
-- Tagging Tables --
----------------------------
*/

--create tag for "table type"
create tag table_type
allowed_values 'fact', 'dimension', 'security';

--simple information_schema query to generate ALTER TABLE commands based on table naming convention
SELECT
'ALTER TABLE ' || CONCAT_WS('.', table_catalog, table_schema, table_name) || ' SET TAG table_type = ' ||
CASE
WHEN LEFT(table_name, 4) = 'DIM_' THEN '''dimension'''
WHEN LEFT(table_name, 4) = 'FACT' THEN '''fact'''
ELSE '''security'''
END
|| ';'
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'POWERBI'
ORDER BY table_name;


--assign table_type tag to tables
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_CUSTOMER SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_DATE SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_FRANCHISE SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_LOCATION SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_MENU_ITEM SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_TIME SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_TRUCK SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_CUSTOMER_METRICS SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_AGG SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_DETAIL SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_HEADER SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.ROW_POLICY_MAP SET TAG table_type = 'security';

Next, we’ll create additional tags to help us track sensitive data points like Personally Identifiable Information (PII) stored in table columns. We’ll then apply these tags to sensitive columns in our customer dimension table.

/*
----------------------------
-- Tagging Columns --
----------------------------
*/

--create tags for PII fields like name, phone number, email, and date of birth
CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_name_tag
COMMENT = 'PII Tag for Name Columns';

CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag
COMMENT = 'PII Tag for Phone Number Columns';

CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_email_tag
COMMENT = 'PII Tag for E-mail Columns';

CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag
COMMENT = 'PII Tag for Date of Birth Columns';

--assign PII tags to sensitive columns in our customer dimension table
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_first_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'First Name';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_last_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Last Name';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_full_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Full Name';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_last_name_first_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Full Name';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_phone_number
SET TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag = 'Phone Number';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_email
SET TAG frostbyte_tasty_bytes.powerbi.pii_email_tag = 'E-mail Address';

ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_dob
SET TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag = 'Date of Birth';

NOTE: Once we have our tags created, we can now assign them via the Snowsight UI (as opposed to writing a bunch of ALTER TABLE commands) as shown below.

On object editing in the Snowsight UI. Easily assign tags, row-access policies, and masking policies to your tables, views, and columns.

Data Classification
Snowflake also supports Classification, in which a Snowflake function can analyze columns in a table, schema, or entire database, and will output Snowflake-defined (system tags) to help track sensitive and private data points.

Using Snowflake Classification is a three-step process:
1) A Data steward or engineer calls the EXTRACT_SEMANTIC_CATEGORIES function to analyze columns in a table and generate the classification output.
2) They then review the classification output to determine if any modifications should be made.
3) Lastly, the data steward or engineer applies the system tags. This can be done manually or by calling the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure.

We’ll use the steps listed above to continue the object-tagging example by letting Snowflake generate additional system tags for our customer dimension table.

Analyze

SELECT EXTRACT_SEMANTIC_CATEGORIES('frostbyte_tasty_bytes.powerbi.dim_customer');

Review

Easily review the results of the function using Snowsight’s results pane

Apply

CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS(
'frostbyte_tasty_bytes.powerbi.dim_customer',
EXTRACT_SEMANTIC_CATEGORIES('frostbyte_tasty_bytes.powerbi.dim_customer')
);
After calling the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure, we can now see our Snowflake system tags have been applied to our customer columns as well.

Column-Level Security

Now that we have a better pulse on our data thanks to Snowflake Object Tagging and Data Classification, we can now explore different types of data protection capabilities that Snowflake provides. This section will focus on Column-Level Security (CLS).

Dynamic Data Masking Policies

Dynamic Data Masking (DDM) is a column-level security feature that obfuscates or “masks” data at query time. With DDM, sensitive data is stored at rest in plain text but is obfuscated on the fly when queries are executed. There are many benefits to using masking policies, including centralizing policy management, reusability across thousands of fields, and simplifying change management (e.g., only having to modify a single policy).

Continuing our example, we’ll now create dynamic data masking policies that will define which roles are able to see the plain-text value, a partially masked value, or a fully masked value.

--create a masking policy that will allow admins to see full-text names, but obfuscate for every other role
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.name_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'BI_ANALYST_GLOBAL') THEN val
ELSE '******'
END;


--create a masking policy that will allow admins to see full-text phone numbers,
--area-codes only for global analysts, and fully masked for every other role
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.phone_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN val
WHEN CURRENT_ROLE() IN ('BI_ANALYST_GLOBAL') THEN CONCAT(LEFT(val,3), '-***-****')
ELSE '***-***-****'
END;


--create a masking policy that will only display the type of email provider for unauthorized roles
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.email_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN val
ELSE CONCAT('******','@', SPLIT_PART(val, '@', -1))
END;


--create a masking policy that will truncate birthdates to the first day of the month for unauthorized roles
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.dob_mask AS (val date) RETURNS date ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'BI_GLOBAL_ADMIN') THEN val
ELSE date_trunc('month', val)
END;

Tag-Based Masking Policies
Now that we have tagged our sensitive columns and have four new masking policies in place, we can choose to keep these as separate objects OR we can marry these concepts together and create tag-based masking policies! Tag-based masking policies have many benefits such as assigning multiple masking policies for different data types to a single tag.

--alter the tags created previously and set the appropriate masking policy for each
ALTER TAG frostbyte_tasty_bytes.powerbi.pii_name_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.name_mask;

ALTER TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.phone_mask;

ALTER TAG frostbyte_tasty_bytes.powerbi.pii_email_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.email_mask;

ALTER TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.dob_mask;


--view tag-based masking policies on our customer dimension
SELECT ref_database_name, ref_schema_name, ref_entity_name, ref_column_name, policy_kind, policy_name, tag_name
FROM TABLE(
frostbyte_tasty_bytes.information_schema.policy_references(
ref_entity_domain => 'table',
ref_entity_name => 'frostbyte_tasty_bytes.powerbi.dim_customer'
)
);
Since our masking policies have been set on our tags, we can now see that our policies are also applied to our customer dimension columns that were previously tagged.
Using the Snowsight UI we can see all of the tags and masking policies that have been created on our customer dimension table.

Row-Level Security

The last Snowflake data governance feature we’ll look at is the row access policy.

Row-Access Policies
Row access policies are Snowflake objects used to implement row-level security (RLS) within Snowflake. Typically, row access policies are used in tandem with a mapping table to dynamically determine access to records in a query result.

Building upon our example, we’ll now implement a row-level security solution by following the steps below:

  1. Create a security table that will “map” Snowflake roles to the subset of records they should have access to.
  2. Populate the security / “mapping” table.
  3. Create a row access policy that will reference the security table and the user’s current role.
  4. Apply our row access policy to our fact tables to restrict what users are able to see at query time based on their role.
--create a security or "mapping" table that will map roles to the subset of records they have access to
CREATE OR REPLACE TABLE frostbyte_tasty_bytes.powerbi.row_policy_map
(
role STRING,
location_skey NUMBER
);

--populate the relevant role to location mappings
INSERT INTO frostbyte_tasty_bytes.powerbi.row_policy_map (role, location_skey)
SELECT
CASE
WHEN location_region = 'EMEA' THEN 'BI_ANALYST_EMEA'
WHEN location_region = 'North America' THEN 'BI_ANALYST_NA'
WHEN location_region = 'APAC' THEN 'BI_ANALYST_APAC'
END AS role,
location_skey
FROM dim_location;


CREATE OR REPLACE ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy
AS (location_skey NUMBER) RETURNS BOOLEAN ->
CURRENT_ROLE() IN
--list of roles that will not be subject to the policy
(
'ACCOUNTADMIN','SYSADMIN', 'BI_ANALYST_GLOBAL'
)
OR EXISTS
--reference our mapping table from above to handle RLS
(
SELECT rp.role
FROM frostbyte_tasty_bytes.powerbi.row_policy_map rp
WHERE 1=1
AND rp.role = CURRENT_ROLE()
AND rp.location_skey = location_skey
);

--apply the row access policy to our dim_location and fact tables
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_location
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);

ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_detail
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);

ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_header
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);

ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_agg
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);
Using the Snowsight UI we can see the table-level tags created in the beginning, as well as the row access policies created in the previous step.

Testing with Power BI

We’ve reached the grand finale! We’ll now verify that the data access policies we created work as intended and propagate downstream to our Power BI DirectQuery dataset.

When connecting to Snowflake from Power BI, you can use Single-Sign-On (SSO) or Snowflake credentials (username and password). In this example, I am using Snowflake credentials to connect to my demo account. Because of this, it is important to parameterize our connection string to Snowflake, (specifically the Snowflake role) as this will allow us to easily toggle roles to validate that our data access policies are working.

NOTE: While SSO was not used in this example, our friends at phData have an excellent blog post titled Enabling Azure AD SSO in Snowflake with Power BI. I highly encourage you to check this out if your organization is looking to configure SSO.

Parameterizing your Power BI Datasets
Every time I create a new Power BI dataset against Snowflake, the first thing I do is create Power BI parameters that contain the Snowflake connection information. Once again, phData has another excellent post on How to Parameterize Snowflake Data Sources in Power BI. Rather than walking through all of the same steps in their post, I will illustrate how I normally parameterize my Power BI datasets.

I usually begin by creating five parameters, three of which are optional depending on how your data is stored and if you’re using SSO or not:

  1. Snowflake Account — your Snowflake account URL
  2. Snowflake Warehouse — the virtual warehouse used to run Power BI queries against
  3. Snowflake Role (optional)— since we’re not using SSO in this scenario, parameterizing your Snowflake Role will specify the context queries from Power BI should run under
  4. Snowflake Database (optional) — the database where all of the data resides in. Only use this if all of your data for your dataset resides in a central database.
  5. Snowflake Schema (optional) — same caveats as parameterizing your database.
The five parameters are outlined above.

Next, you need to modify each query to reference the parameters you created. This ensures that if and when you toggle the parameters in the future (e.g., change a warehouse or role) the individual dataset tables inherit the change. To “wire up” your queries and parameters you’ll need to open the Advanced Editor as shown below. From there, you can tweak the Snowflake connection in the M code to reference your parameters.

Once completed, you can use Power Query’s built-in Query Dependencies view to validate that all the queries are referencing the connection parameters.

Lastly, once our dataset is published to a Power BI workspace, we can then access and change our parameter values at any time within the dataset settings. Simply change the Snowflake Role parameter, hit ‘Apply’, then refresh your report page to watch the change take effect.

We’ll change our Snowflake Role parameter to validate that our data access policies are working.

The Power BI Dataset and Report

The Power BI dataset/model mirrors the schema back in Snowflake. The DirectQuery dataset has four fact tables and seven dimensions. The model was designed by following the suggested best practices for DirectQuery mode with Snowflake. The corresponding report is a live connection to the DirectQuery dataset and is comprised of an Order Summary page, with a hidden drill-through to line-level details that contain customer PII data points.

Testing Column-Level Security

Using the BI_ANALYST_GLOBAL Role, we see that we have full access to the customer's name and a partially-masked phone number and email address. This matches the masking policies we created earlier.

Changing our Snowflake Role parameter to use the BI_ANALYST_NA Role, we see that our customer name and phone number are completely redacted, but we still see a partially-masked email address. This, too, matches the masking policies we created earlier.

Testing Row-Level Security

Using the BI_ANALYST_GLOBAL Role, we have full access to all records in the dimensional model. We can see our breakdown of sales by all cities around the globe.

Changing our Snowflake Role parameter to use the BI_ANALYST_EMEA Role, we are now restricted to only the order records that occurred within our European locations. This is roughly 1/3 of the total dataset back in Snowflake.

Conclusion

Snowflake’s rich feature-set of data governance capabilities makes it easy for data teams to understand and protect their data. Object tagging and Classification allow data stewards to identify and catalog sensitive data, while data masking and row access policies ensure that data doesn’t end up in the wrong hands. Snowflake’s new Governance interface simplifies the entire data governance process and eliminates the need to write mountains of SQL and memorize syntax. Lastly, with Roche’s maxim in mind of centralizing logic further upstream in the data warehouse, we saw how protections within Snowflake seamlessly propagate downstream to a Power BI dataset and connected reports.

[Special thanks to Matthew Roche and my longtime friend and former colleague, Lindsay Pinchot. Matthew for his Maxim of Data Transformation. I have (and continue) to point customers to his blog whenever the topic arises. Lindsay for always being my personal Editor-in-Chief and making me sound smarter than I actually am.]

--

--

Josh Crittenden
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Sales Engineer @ Snowflake | Former DW & Power BI Architect | Husband | Father | Coffee Addict | Viewpoints are my own