Unlocking the Power of Data Governance and Cost Control: Insights from Snowflake Data Cloud World Tour 2023

Vijayalakshmi Sridharan
BI3 Technologies
Published in
8 min readSep 29, 2023

In this blog post, I’ll break down the key takeaways from the Data Governance & Privacy Session held at the Snowflake Data Cloud World Tour 2023 in Bangalore this September.

Let’s explore the vital insights that can assist you in protecting and managing your data effectively.

Pillars of Data Governance

1. Know Your Data: Classify data, tag sensitive data, track its flow, and audit its usage.
2. Protect Your Data: Scalably secure access to sensitive & regulated data with granular access policies
3. Connect Your Ecosystem: Seamlessly extend data governance capabilities to data sharing and business continuity across regions and clouds

Challenges & Expectations in Data Quality

  1. Compliance Risk: Proactively know when data freshness, validity, uniqueness, or correctness dimensions fall beyond acceptable thresholds
  2. Business Risk: Programmatically validate freshness, nullness, duplicates, and primary keys in the data pipeline workflow
  3. Credibility: Resue definitions of quality metrics at scale. Consistently evaluate quality metrics across workloads

Data Quality Monitoring (Private Preview Soon)

Automated Metrics Measurement

Use out-of-box metrics or create custom metrics that are standardized across your organization.
Example metrics: Freshness, Volume, Accuracy, Statistics.

Measuring Data Quality:

  • Automate periodic measurement of defined metrics for critical data
  • Snowflake handles measurement incrementally in the background
  • No need for manual task specification

Monitoring Data Quality:

  • Automatically capture metrics for alerting, troubleshooting, and visualization
  • Snowflake centralizes metrics in a dedicated table
  • Data remains secure, with no access to Snowflake

Efficiency and Reusability:

  • Define metrics once for consistent use across various objects
  • No management overhead with tasks or jobs
  • Cost-effective as it evaluates incremental data only

Snowflake Partner Eco-system:

  • Snowflake provides Data Quality Monitoring building blocks
  • Partners can extend and leverage these solutions

Data Metric Function:

Define your own custom data quality metrics with a new schema-level object

CREATE DATA METRIC FUNCTION VALID_AGE ( ARG_T TABLE (ARG_C NUMBER) ) 
RETURNS NUMBER AS
‘SELECT COUNT_IF(ARG_C BETWEEN 0 AND 200) from ARG_T’ ;

Query Constraint Policies (Private Preview)

Query Constraint policies provide organizations with powerful tools to ensure data privacy, particularly within Data Clean rooms.

These policies effectively restrict the types of queries that can be executed on protected data, ensuring sensitive information remains secure.

Aggregation Constraint Policy:

This policy safeguards individual privacy by allowing queries related to groups but not individuals.

Example: Users are permitted to query sales data using GROUP BY to obtain total sales figures for different regions. However, they cannot access individual transaction details.

Projection Constraint Policy:

The Projection Constraint policy permits the use of a column in a WHERE clause or a join operation while preventing the exposure of that column in the query result.

Example: An analyst can join a customer table with another table based on the customer’s social security number. However, they are restricted from including the customer’s social security number in the final query result.

Data Access Policies (Public Preview)

Tag-based Masking

Tag Based Masking
  • Assign tags to data masking policies for specific columns
  • Provides granular control over data protection

Schema-wide Tag-based Masking

  • Extend masking policies to all current and future schema objects
  • Streamlines and simplifies data protection management

Conditional Masking Policy

A conditional masking policy is a security measure used in databases to mask sensitive data unless specific conditions are met.

Here’s an example:

  • Department managers can see all salaries in their department without any masking.
  • Non-department managers or managers from different departments can only see their own salary or their department’s salary information. Other salaries are masked.

Dynamic Masking on External Tables

Now, you can assign a masking policy directly or using tag-based masking to virtual columns in External Tables.

Modularize Policies and Boost Performance

  • UDFs within Row Access Policies and Masking Policies reuse result cache
  • Memoizable Functions

Advanced Data Access Audit

  1. Masking Policy Assigned to a Queried data — Audit if an accessed sensitive column in a table or view had a masking policy at the time it was queried
  2. Tag and Policy Modification history — Audit who, when modifications to tag and policy associations
  • Tag value change
  • Policy body change
  • Assignment modifications

3. Table Schema Change History — Keep track of new tables in sensitive schemas or new column additions to monitored tables for immediate action such as those triggering classification

Snowsight

Interactive SQL editor with charts

  • Fast and Responsive Querying — A fast, desktop-quality editor on the web
  • Smart Autocomplete — Contextual suggestions based on your query and SQL dialect, such as aliases and functions
  • Write Less SQL — Use a date picker to simplify data selection
  • Interactive Results — Preview data fast no matter how many rows a query returns
  • Automatic Stats — Interactive stats for all columns help you catch errors and spot trends without follow-up queries

Data Governance UI (General Availability Soon)

Gain Insights: At-a-glance summary of tagged assets and protection status

Take Action: Follow intuitive UI to apply tags and data policies to sensitive objects

Report Compliance: Narrow down to the desired database or schema level to generate reports

Classification UI (Private Preview)

A user-friendly interface for easily accepting or rejecting Snowflake’s recommended semantic category tags, simplifying the process of organizing data into a preferred schema.

For example, it suggests tags like “Name” for the “Name” field and “Email” for the “Email” field.

UK, Australia, Canada-based PII Classification (Private Preview)

Snowflake now supports additional categories and grouping for UK, Australia, and Canada-based data, enhancing privacy and compliance.

Cybersecurity

Security Analytics

Challenges in Cyber Security

1. Data Silos: Centralizing all security-relevant data is challenging and cost-prohibitive
2. Manual Processes: Reviewing noisy alerts and manually validating controls slows down reaction time
3. Uninformed Decisions: Leadership and engineers acting without data-driven metrics and prioritized findings
4. Limited Resources: Organization cannot hire security experts fast enough to keep up with the demand

Security Data Lake

Security Data Lake
  • Maintain ownership of your data
  • Centralize security and business data to add context
  • Flexibility for data portability and different tools
  • SQL access broadens usability for additional threat detection
Cyber Use Cases

Bring Your Own (BYO) Snowflake

  • Single Source of Truth — All logs, assets, and configurations are analyzed together removing silos and reducing complexity
  • Transparent Pricing and Cost Savings — Store near-unlimited amounts of data at affordable cloud rates and pay Snowflake only for the compute resources you use
  • Faster Detection and Response to Threats — Centralized Next-Gen SIEM solution streamlines investigation and acts as an extension of the customer’s Data Cloud
  • Simplify Data Security and Governance — Single copy of data enables consistent implementation of security and privacy controls for data protection
  • Cloud Platform Agnostic — Retain your data in the cloud platform of choice without compromising on threat detection and response capabilities

Optimization Features

Price & Performance

  • Improved storage compression by 30%
  • Search Optimization Service — Efficient data structure (search access path in the background) managed by Snowflake in a serverless fashion

Tagging for Cost Optimization

Tagging for Cost Optimization

Tagging objects and accounts in Snowflake allows you to categorize them for better management and resource allocation.

You can tag warehouses, and accounts based on department (e.g., Sales, Finance) or environment (e.g., Dev, Prod) to easily identify their purpose and usage, ensuring optimal performance and cost control in your Snowflake environment.

Multi-Cluster Warehouse Controls

  1. Standard Scaling Policy
  • Optimize for minimizing queuing over conserving credits
  • Every successive cluster spin-up waits for at least 20 seconds.

2. Economy Scaling Policy

  • Optimizes for conserving credits by keeping running clusters fully loaded
  • New clusters are spun up only if the system estimates there will be enough jobs to keep the cluster busy for at least 6 minutes

Snowflake Performance Index

The SPI gauges improvements in Snowflake performance for customers by assessing a consistent set of customer workloads, comprising millions of monthly jobs.

Snowflake Performance Index (SPI)

Automatic Clustering

It is a service that handles table reclustering as required, without the need for you to supply a virtual warehouse. It does consume Snowflake credits.

Snowpark Optimized Workloads for ML

Snowpark-optimized warehouses offer 16 times more memory per node than standard Snowflake virtual warehouses.

They are ideal for tasks with substantial memory needs, like machine learning training via a stored procedure on a single virtual warehouse node.

Budgets (Public Preview Soon)

Budgets help customers monitor warehouse and serverless usage (i.e. automatic clustering, materialized views, search optimization, pipe, and replication).

Warehouse Utilization (Private Preview)

Provides customers more visibility into their warehouse utilization so they can better estimate the capacity and size of warehouses

For instance, if a medium-sized warehouse is underutilized, downsizing to a small or extra-small warehouse could result in cost savings.

Snowflake consistently enhances hardware for better performance.

SnowLens — SIS (Streamlit In Snowflake)

SIS, or Streamlit In Snowflake, is a Snowflake native app that enables users to explore the critical aspects of Data Observability, Usage Tracking, and Monitoring, providing invaluable insights into maintaining the health and utility of the Snowflake ecosystem.

SnowLens

Benefits:

  1. Enhanced Visibility: Get a complete view of your Snowflake account.
  2. User-Friendly: SIS is built on Streamlit for a smooth and intuitive interface.
  3. Data Security: Your data never leaves Snowflake, ensuring security and compliance.
  4. Operational Insights: Gain valuable insights for cost optimization.

Partner with Snowflake to extend their SnowLens code with additional features, then capitalize on this enhanced solution by selling it on the Snowflake Marketplace to generate revenue.

Conclusion

In conclusion, at Snowflake Data Cloud World Tour 2023, they emphasized data governance through the pillars of knowing, protecting, and connecting data, allowing organizations to classify, secure, and share data effectively. They introduced data quality monitoring, robust data privacy tools, and a user-friendly interface called Snowsight.

They also enhanced cybersecurity with a Security Data Lake and BYO Snowflake, and added optimization features. SnowLens, a native app, provided insights into data observability and usage tracking, with opportunities for partners to extend its functionality.

Overall, Snowflake Data Cloud World Tour 2023 showcased a comprehensive ecosystem for data governance, quality, security, and optimization.

Among the above features introduced in Snowflake Data Cloud World Tour 2023, which one excites you the most?

--

--

Vijayalakshmi Sridharan
BI3 Technologies

Passionate Data Engineer with expertise in Snowflake, C#, Azure, AWS, and Power BI . Continuous Learner. LinkedIn Id - vijayalakshmi-sridharan