Exploring Snowflake Data Governance with Power BI
In this post, we’ll delve into Snowflake's data governance features and explore how column and row-level security protections within Snowflake can be leveraged by popular BI platforms such as Power BI. Specifically, we’ll review Snowflake's new Governance UI, object tagging, data classification, dynamic data masking, tag-based masking, and row access policies. Lastly, we’ll test our data access policies via a DirectQuery dataset in Power BI.
Intro
Before I joined Snowflake, I spent the first decade-plus of my career in the Microsoft data ecosystem, primarily as a data warehousing and BI architect for consulting firms. During my time as a consultant, I heard a quote (a maxim actually) that has lived rent-free in my head ever since.
Data should be transformed as far upstream as possible, and as far downstream as necessary.
My fellow Power BI practitioners likely recognize this quote as Roche’s Maxim of Data Transformation, coined by Matthew Roche, PM on the Power BI Customer Advisory Team (CAT).
While Roche’s Maxim focuses on data transformations, a case can be made that it can also apply to data protections, such as column and row-level security. Implementing data governance policies directly in Snowflake ensures that data is protected downstream, both within Power BI and for a myriad of additional organizational use cases. For example, your data will remain protected when consumed by other BI platforms, data science workloads, connected applications, ad-hoc analytics, and more. Snowflake has a rich set of data governance capabilities that aim to help customers in three main categories:
- Knowing your data
- Protecting your data
- Connecting your data to your ecosystem
With Roche’s Maxim in mind, let’s review some of Snowflake’s robust governance features, then put them to the test with Power BI.
NOTE: Many of the SQL scripts used in this post were derived from the Tasty ByteZero to Snowflake — Data Governance Quickstart. This is an excellent hands-on lab to quickly get you up and running with Snowflake’s governance capabilities. Spin up a free trial and test these features for yourself!
Snowflake’s new Governance UI
Earlier this month Snowflake released the public preview of the new Governance interface in Snowsight. The Governance interface provides Snowflake admins with an overview of object tagging and data protections applied to tables and columns within their account. From the Dashboard summary (shown below), admins can drill into detailed reports (Tagged Objects screen) to hone in on specific objects (e.g., databases, schemas), tags, and data access policies.
From the Tagged Objects screen (shown below), users can click on a specific record and Snowsight automatically opens a new tab taking you to that underlying object (e.g., Table Details page or Columns page). This allows data stewards to assign tags and access policies quickly and easily without writing any SQL.
In the next few sections, we’ll explore specific Snowflake Governance capabilities like object tagging and data access policies. We’ll also work through an example of tagging and protecting tables and columns in our Snowflake dimensional model that feeds a downstream Power BI DirectQuery dataset.
Data Sensitivity & Access Visibility
Object Tagging
So what is a tag anyway? Tags are Snowflake objects that can be assigned to another Snowflake object (e.g., table, column, virtual warehouse). From a usage perspective, tags can be used by data stewards to track sensitive data, monitor resource consumption, and audit activities, to name a few.
We’ll start our example by creating Snowflake tags to categorize our dimensional model at the table level.
/*
----------------------------
-- Tagging Tables --
----------------------------
*/
--create tag for "table type"
create tag table_type
allowed_values 'fact', 'dimension', 'security';
--simple information_schema query to generate ALTER TABLE commands based on table naming convention
SELECT
'ALTER TABLE ' || CONCAT_WS('.', table_catalog, table_schema, table_name) || ' SET TAG table_type = ' ||
CASE
WHEN LEFT(table_name, 4) = 'DIM_' THEN '''dimension'''
WHEN LEFT(table_name, 4) = 'FACT' THEN '''fact'''
ELSE '''security'''
END
|| ';'
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'POWERBI'
ORDER BY table_name;
--assign table_type tag to tables
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_CUSTOMER SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_DATE SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_FRANCHISE SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_LOCATION SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_MENU_ITEM SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_TIME SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.DIM_TRUCK SET TAG table_type = 'dimension';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_CUSTOMER_METRICS SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_AGG SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_DETAIL SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.FACT_ORDER_HEADER SET TAG table_type = 'fact';
ALTER TABLE FROSTBYTE_TASTY_BYTES.POWERBI.ROW_POLICY_MAP SET TAG table_type = 'security';
Next, we’ll create additional tags to help us track sensitive data points like Personally Identifiable Information (PII) stored in table columns. We’ll then apply these tags to sensitive columns in our customer dimension table.
/*
----------------------------
-- Tagging Columns --
----------------------------
*/
--create tags for PII fields like name, phone number, email, and date of birth
CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_name_tag
COMMENT = 'PII Tag for Name Columns';
CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag
COMMENT = 'PII Tag for Phone Number Columns';
CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_email_tag
COMMENT = 'PII Tag for E-mail Columns';
CREATE OR REPLACE TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag
COMMENT = 'PII Tag for Date of Birth Columns';
--assign PII tags to sensitive columns in our customer dimension table
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_first_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'First Name';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_last_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Last Name';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_full_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Full Name';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_last_name_first_name
SET TAG frostbyte_tasty_bytes.powerbi.pii_name_tag = 'Full Name';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_phone_number
SET TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag = 'Phone Number';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_email
SET TAG frostbyte_tasty_bytes.powerbi.pii_email_tag = 'E-mail Address';
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_customer
MODIFY COLUMN customer_dob
SET TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag = 'Date of Birth';
NOTE: Once we have our tags created, we can now assign them via the Snowsight UI (as opposed to writing a bunch of ALTER TABLE commands) as shown below.
Data Classification
Snowflake also supports Classification, in which a Snowflake function can analyze columns in a table, schema, or entire database, and will output Snowflake-defined (system tags) to help track sensitive and private data points.
Using Snowflake Classification is a three-step process:
1) A Data steward or engineer calls the EXTRACT_SEMANTIC_CATEGORIES function to analyze columns in a table and generate the classification output.
2) They then review the classification output to determine if any modifications should be made.
3) Lastly, the data steward or engineer applies the system tags. This can be done manually or by calling the ASSOCIATE_SEMANTIC_CATEGORY_TAGS stored procedure.
We’ll use the steps listed above to continue the object-tagging example by letting Snowflake generate additional system tags for our customer dimension table.
Analyze
SELECT EXTRACT_SEMANTIC_CATEGORIES('frostbyte_tasty_bytes.powerbi.dim_customer');
Review
Apply
CALL ASSOCIATE_SEMANTIC_CATEGORY_TAGS(
'frostbyte_tasty_bytes.powerbi.dim_customer',
EXTRACT_SEMANTIC_CATEGORIES('frostbyte_tasty_bytes.powerbi.dim_customer')
);
Column-Level Security
Now that we have a better pulse on our data thanks to Snowflake Object Tagging and Data Classification, we can now explore different types of data protection capabilities that Snowflake provides. This section will focus on Column-Level Security (CLS).
Dynamic Data Masking Policies
Dynamic Data Masking (DDM) is a column-level security feature that obfuscates or “masks” data at query time. With DDM, sensitive data is stored at rest in plain text but is obfuscated on the fly when queries are executed. There are many benefits to using masking policies, including centralizing policy management, reusability across thousands of fields, and simplifying change management (e.g., only having to modify a single policy).
Continuing our example, we’ll now create dynamic data masking policies that will define which roles are able to see the plain-text value, a partially masked value, or a fully masked value.
--create a masking policy that will allow admins to see full-text names, but obfuscate for every other role
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.name_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'BI_ANALYST_GLOBAL') THEN val
ELSE '******'
END;
--create a masking policy that will allow admins to see full-text phone numbers,
--area-codes only for global analysts, and fully masked for every other role
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.phone_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN val
WHEN CURRENT_ROLE() IN ('BI_ANALYST_GLOBAL') THEN CONCAT(LEFT(val,3), '-***-****')
ELSE '***-***-****'
END;
--create a masking policy that will only display the type of email provider for unauthorized roles
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.email_mask AS (val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN') THEN val
ELSE CONCAT('******','@', SPLIT_PART(val, '@', -1))
END;
--create a masking policy that will truncate birthdates to the first day of the month for unauthorized roles
CREATE OR REPLACE MASKING POLICY frostbyte_tasty_bytes.powerbi.dob_mask AS (val date) RETURNS date ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN', 'ACCOUNTADMIN', 'BI_GLOBAL_ADMIN') THEN val
ELSE date_trunc('month', val)
END;
Tag-Based Masking Policies
Now that we have tagged our sensitive columns and have four new masking policies in place, we can choose to keep these as separate objects OR we can marry these concepts together and create tag-based masking policies! Tag-based masking policies have many benefits such as assigning multiple masking policies for different data types to a single tag.
--alter the tags created previously and set the appropriate masking policy for each
ALTER TAG frostbyte_tasty_bytes.powerbi.pii_name_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.name_mask;
ALTER TAG frostbyte_tasty_bytes.powerbi.pii_phone_number_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.phone_mask;
ALTER TAG frostbyte_tasty_bytes.powerbi.pii_email_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.email_mask;
ALTER TAG frostbyte_tasty_bytes.powerbi.pii_dob_tag
SET MASKING POLICY frostbyte_tasty_bytes.powerbi.dob_mask;
--view tag-based masking policies on our customer dimension
SELECT ref_database_name, ref_schema_name, ref_entity_name, ref_column_name, policy_kind, policy_name, tag_name
FROM TABLE(
frostbyte_tasty_bytes.information_schema.policy_references(
ref_entity_domain => 'table',
ref_entity_name => 'frostbyte_tasty_bytes.powerbi.dim_customer'
)
);
Row-Level Security
The last Snowflake data governance feature we’ll look at is the row access policy.
Row-Access Policies
Row access policies are Snowflake objects used to implement row-level security (RLS) within Snowflake. Typically, row access policies are used in tandem with a mapping table to dynamically determine access to records in a query result.
Building upon our example, we’ll now implement a row-level security solution by following the steps below:
- Create a security table that will “map” Snowflake roles to the subset of records they should have access to.
- Populate the security / “mapping” table.
- Create a row access policy that will reference the security table and the user’s current role.
- Apply our row access policy to our fact tables to restrict what users are able to see at query time based on their role.
--create a security or "mapping" table that will map roles to the subset of records they have access to
CREATE OR REPLACE TABLE frostbyte_tasty_bytes.powerbi.row_policy_map
(
role STRING,
location_skey NUMBER
);
--populate the relevant role to location mappings
INSERT INTO frostbyte_tasty_bytes.powerbi.row_policy_map (role, location_skey)
SELECT
CASE
WHEN location_region = 'EMEA' THEN 'BI_ANALYST_EMEA'
WHEN location_region = 'North America' THEN 'BI_ANALYST_NA'
WHEN location_region = 'APAC' THEN 'BI_ANALYST_APAC'
END AS role,
location_skey
FROM dim_location;
CREATE OR REPLACE ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy
AS (location_skey NUMBER) RETURNS BOOLEAN ->
CURRENT_ROLE() IN
--list of roles that will not be subject to the policy
(
'ACCOUNTADMIN','SYSADMIN', 'BI_ANALYST_GLOBAL'
)
OR EXISTS
--reference our mapping table from above to handle RLS
(
SELECT rp.role
FROM frostbyte_tasty_bytes.powerbi.row_policy_map rp
WHERE 1=1
AND rp.role = CURRENT_ROLE()
AND rp.location_skey = location_skey
);
--apply the row access policy to our dim_location and fact tables
ALTER TABLE frostbyte_tasty_bytes.powerbi.dim_location
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);
ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_detail
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);
ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_header
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);
ALTER TABLE frostbyte_tasty_bytes.powerbi.fact_order_agg
ADD ROW ACCESS POLICY frostbyte_tasty_bytes.powerbi.dim_location_policy ON (location_skey);
Testing with Power BI
We’ve reached the grand finale! We’ll now verify that the data access policies we created work as intended and propagate downstream to our Power BI DirectQuery dataset.
When connecting to Snowflake from Power BI, you can use Single-Sign-On (SSO) or Snowflake credentials (username and password). In this example, I am using Snowflake credentials to connect to my demo account. Because of this, it is important to parameterize our connection string to Snowflake, (specifically the Snowflake role) as this will allow us to easily toggle roles to validate that our data access policies are working.
NOTE: While SSO was not used in this example, our friends at phData have an excellent blog post titled Enabling Azure AD SSO in Snowflake with Power BI. I highly encourage you to check this out if your organization is looking to configure SSO.
Parameterizing your Power BI Datasets
Every time I create a new Power BI dataset against Snowflake, the first thing I do is create Power BI parameters that contain the Snowflake connection information. Once again, phData has another excellent post on How to Parameterize Snowflake Data Sources in Power BI. Rather than walking through all of the same steps in their post, I will illustrate how I normally parameterize my Power BI datasets.
I usually begin by creating five parameters, three of which are optional depending on how your data is stored and if you’re using SSO or not:
- Snowflake Account — your Snowflake account URL
- Snowflake Warehouse — the virtual warehouse used to run Power BI queries against
- Snowflake Role (optional)— since we’re not using SSO in this scenario, parameterizing your Snowflake Role will specify the context queries from Power BI should run under
- Snowflake Database (optional) — the database where all of the data resides in. Only use this if all of your data for your dataset resides in a central database.
- Snowflake Schema (optional) — same caveats as parameterizing your database.
Next, you need to modify each query to reference the parameters you created. This ensures that if and when you toggle the parameters in the future (e.g., change a warehouse or role) the individual dataset tables inherit the change. To “wire up” your queries and parameters you’ll need to open the Advanced Editor as shown below. From there, you can tweak the Snowflake connection in the M code to reference your parameters.
Once completed, you can use Power Query’s built-in Query Dependencies view to validate that all the queries are referencing the connection parameters.
Lastly, once our dataset is published to a Power BI workspace, we can then access and change our parameter values at any time within the dataset settings. Simply change the Snowflake Role parameter, hit ‘Apply’, then refresh your report page to watch the change take effect.
The Power BI Dataset and Report
The Power BI dataset/model mirrors the schema back in Snowflake. The DirectQuery dataset has four fact tables and seven dimensions. The model was designed by following the suggested best practices for DirectQuery mode with Snowflake. The corresponding report is a live connection to the DirectQuery dataset and is comprised of an Order Summary page, with a hidden drill-through to line-level details that contain customer PII data points.
Testing Column-Level Security
Using the BI_ANALYST_GLOBAL Role, we see that we have full access to the customer's name and a partially-masked phone number and email address. This matches the masking policies we created earlier.
Changing our Snowflake Role parameter to use the BI_ANALYST_NA Role, we see that our customer name and phone number are completely redacted, but we still see a partially-masked email address. This, too, matches the masking policies we created earlier.
Testing Row-Level Security
Using the BI_ANALYST_GLOBAL Role, we have full access to all records in the dimensional model. We can see our breakdown of sales by all cities around the globe.
Changing our Snowflake Role parameter to use the BI_ANALYST_EMEA Role, we are now restricted to only the order records that occurred within our European locations. This is roughly 1/3 of the total dataset back in Snowflake.
Conclusion
Snowflake’s rich feature-set of data governance capabilities makes it easy for data teams to understand and protect their data. Object tagging and Classification allow data stewards to identify and catalog sensitive data, while data masking and row access policies ensure that data doesn’t end up in the wrong hands. Snowflake’s new Governance interface simplifies the entire data governance process and eliminates the need to write mountains of SQL and memorize syntax. Lastly, with Roche’s maxim in mind of centralizing logic further upstream in the data warehouse, we saw how protections within Snowflake seamlessly propagate downstream to a Power BI dataset and connected reports.
[Special thanks to Matthew Roche and my longtime friend and former colleague, Lindsay Pinchot. Matthew for his Maxim of Data Transformation. I have (and continue) to point customers to his blog whenever the topic arises. Lindsay for always being my personal Editor-in-Chief and making me sound smarter than I actually am.]