Data Classification with Snowflake — Part 1

Athavale Mandar
4 min readFeb 19, 2024

--

Welcome to my first ever blog on Snowflake features. Learning never stops with Snowflake and hence I subscribed to Snowflake blog over Email — one more avenue to keep yourself up to date apart from Snowflake documentation, LinkedIn, YouTube & WhatsApp. In this blog, I am going to cover recent feature rollout by Snowflake communicated via Snowflake Blog over Email. Email looks something like this :)

I started exploring the article and eventually did hands on using my free trial. Here is the gist of this feature.

  1. This is native Snowsight UI feature to run, review and apply data classification.
  2. Good news — this is in public preview — so you can check it out yourself quickly using below sample script.
  3. This is a table level feature that can be applied from schema navigation within Snowsight.

Now let’s see this feature in action.

  1. Prepare sample data. Here is the sample script that you can try yourself in trial account. This script creates exact replica of Snowflake sample data BIG tables i.e. CUSTOMER & PART. Reason I choose these tables is to do performance benchmarking as well.
use role accountadmin;
create database demo_classify;
create table customer as select * from snowflake_sample_data.tpch_sf1000.customer;
create table part as select * from snowflake_sample_data.tpch_sf1000.part;

2. Now let’s see how to perform Data Classification on this dataset. On the revamped Snowsight UI, navigate through Data → Databases → DEMO_CLASSIFY → PUBLIC.

3. Now click on the 3-dot menu besides Create button and click on “Classify and Tag Sensitive Data” as shown below. Please note that, this option configuration starts at schema level.

4. In the next navigation, you get to select the tables that you want to apply Data Classification for. By default, ALL tables are selected.

5. I decided to run it for both the tables separately , in order to do performance benchmarking. I started with table CUSTOMER and it took less than a minute to complete the process for 150M records. UI looks like below after this step.

6. Now click on “View Results”. I played around with this UI a bit and observed that, it has classified C_PHONE column as IDENTIFIER column — Seems right for a retail customer.

7. In case we do not wish to continue with this classification, we have an option to change it using dropdown. Note that, Snowflake has identified data in column C_PHONE as PHONE_NUMBER semantics.

8. Now last step is to complete classification. After this, UI looks as below

9. In Snowflake, the feature is useless if it cannot be enforced with RBAC. To check this, I tried performing this activity with SYSADMIN role — after granting only SELECT privileges on both the tables. It thrown some unexpected results or error message was not clear. Even after granting SELECT on both the tables, error message mentioned “Please switch to a role with SELECT or OWNERSHIP privileges. I hope this is fixed by the time this feature is GA.

grant usage on database demo_classify to role sysadmin;
grant usage on schema public to role sysadmin;
grant select on table customer to role sysadmin;
grant select on table part to role sysadmin;

10. After transferring OWNERSHIP of both the tables to SYSADMIN, features works well. We can see that both the tables classification status is Reviewed.

In the next blog, we will see how to perform this action , monitor the progress, view the results & utilize this tagging — all using Native SQL. Stay tuned !

You can read the original blog communicated via Email here.

--

--