Approach to Data Security in Snowflake Part 2 — Snowflake and Custom Data Classification

Published in

clouddataplatform

4 min readMar 30, 2024

Disclaimer: This blog does not delve into the technical intricacies of writing SQL scripts for Data Classification, which are better explained in Snowflake Documentation. The purpose of this blog is to elucidate the architecture and approach towards data classification, especially in scenarios where enterprise data classification does not align with Snowflake-defined categories, a common occurrence in large-scale enterprises.

Part 1 — Introduction to Snowflake Data Security

Part 3 — Custom Data Classification using TAG automation approach

In Part 2 of blog series on Data Security in Snowflake, we explore Snowflake Data Classification and Custom Data Classification. This segment aims to elaborate on the significance of data classification within Snowflake and how it can be effectively managed using the various features offered by the platform.

Every organization deals with a myriad of data types, spanning personal information, financial records, trade secrets, research findings, and patents. Without robust data governance and security measures encompassing all systems, data security remains a pressing concern.

The foundational step towards fortifying data security lies in data classification — the process of categorizing data based on its characteristics. This involves identifying whether data pertains to individuals, subjects, finances, trades, organizational secrets, or other specific domains.

Of paramount importance is personal identifiable information (PII) data, which is subject to stringent regulations in many countries such as GDPR or CCPA, mandating how enterprises must handle such data.

Understanding overall approach in Snowflake for Classification

Approach of Data Classification
Snowflake Data Classification using Snowflake functions API
PRIVACY CATEGORIES as described by Snowflake IDENTIFIER, QUASI_IDENTIFIER and SENSITIVE
SEMANTIC CATEGORIES covering various values like PII, Account, Address, Salary, Geography information's
Custom Snowflake Classifier created using Snowflake CREATE CUSTOM_CLASSIFIER for custom SEMANTIC CATEGORY ultimately mapping to Snowflake’s PRIVACY CATEGORY
Semi Data Classification though termed as Custom Classification where enterprise can define their own specific classification for unmapped Semantics from Snowflake and map to specific Privacy category like Sematic may be Medical Code and Privacy may be Sensitive
Enterprise custom classification automation which is independent of Snowflake’s classification to cover custom data governance model with broad classification like Trade secrets, financial figures, secret codes etc
The whole automation process which will enable enterprise to define custom classification without using Snowflake API classification functions or CREATE CUSTOM_CLASSIFIER, Enterprise will have greater control with own classification and further extended to Data Mesh capabilities where in Domain Owners can control classification based on their Governance Model

The diagram above illustrates how an overarching approach can be implemented to define custom data classification using Snowflake TAGS and TAG-based policies.

The first step involves utilizing an existing Data Classification Matrix developed over time within an enterprise by the Data Governance team. This includes understanding and creating TAGS definitions (not tags) that can be utilized to classify fields.

The second step is to analyze fields based on semantics, which are essentially analyzed by Snowflake to extract semantics due to either name or value extractions. For instance, if a field is named “employee_monthly_earning,” Snowflake’s Extract Semantics may not recognize it as “Salary” and mark it with tags such as “SALARY” and “SENSITIVE.”

The third step entails reviewing the output of the custom semantic process and deciding whether to associate the tags with fields or not.

The fourth step involves applying the output of the custom semantic extraction to fields so that TAG-based policies can automatically manage masking on fields defined in the data classification per visibility to role.

The third blog in this series will cover the overall custom classification approach with the four steps mentioned above. It’s important to note that this outlines the architecture and not the technical steps to achieve this. For technical implementation at the field level, please refer to the following links. Alternatively for detailed technical steps to achieve this, please do reach out with comments on this blog series.

Introduction to Classification | Snowflake Documentation

This topic provides information on how classification works. For information on how to use custom classifiers, see…

docs.snowflake.com

Use Data Classification with the classic APIs | Snowflake Documentation

This topic provides examples of classifying data with the classic APIs in Snowflake: EXTRACT_SEMANTIC_CATEGORIES and…

docs.snowflake.com

Custom Data Classification | Snowflake Documentation

This topic provides concepts on Custom Data Classification in Snowflake. Snowflake provides the CUSTOM_CLASSIFIER in…

docs.snowflake.com

Announcing Public Preview of Custom Classification and new SQL interfaces

Data classification in Snowflake provides a way to identify sensitive information in the data. Once the data has been…

medium.com

Approach to Data Security in Snowflake Part 2 — Snowflake and Custom Data Classification

Introduction to Classification | Snowflake Documentation

This topic provides information on how classification works. For information on how to use custom classifiers, see…

Use Data Classification with the classic APIs | Snowflake Documentation

This topic provides examples of classifying data with the classic APIs in Snowflake: EXTRACT_SEMANTIC_CATEGORIES and…

Custom Data Classification | Snowflake Documentation

This topic provides concepts on Custom Data Classification in Snowflake. Snowflake provides the CUSTOM_CLASSIFIER in…

Announcing Public Preview of Custom Classification and new SQL interfaces

Data classification in Snowflake provides a way to identify sensitive information in the data. Once the data has been…

Written by Ramesh Sanap