Snowflake Aggregation Policies

Fru
DemoHub Tutorials
Published in
4 min readJun 28, 2024

Note: This tutorial is self-contained, with sample data and scripts provided to ensure you can follow the exercise in your practice environment. For more tutorials, visit tutorials.demohub.dev

In the world of data sharing and analysis, safeguarding sensitive information while enabling meaningful insights is a constant challenge. Snowflake’s Aggregation Policies offer a powerful mechanism to strike this balance. By defining rules that enforce minimum levels of aggregation in query results, you can protect individual-level data while still allowing users to extract valuable knowledge from summarized information.

What are Aggregation Policies?

Aggregation policies are schema-level objects in Snowflake that control how data can be accessed and queried from a table or view. They serve as gatekeepers, ensuring that queries adhere to specific aggregation requirements to prevent the exposure of individual records.

Key Concepts

  • Aggregation-Constrained: A table or view with an aggregation policy assigned to it.
  • Minimum Group Size: The minimum number of rows that must be aggregated in a query result.
  • Remainder Group: A group in the query result containing rows that don’t meet the minimum group size; their grouping key value is set to NULL.

How It Works

When a query is executed on an aggregation-constrained table, Snowflake performs these checks:

  1. Aggregation Requirement: The query must aggregate data, either through an allowed aggregation function (e.g., SUM, AVG, COUNT) or by using GROUP BY.
  2. Minimum Group Size: Each group created by the query must contain at least the specified minimum number of rows.

If a query violates either of these requirements, Snowflake blocks it to protect the privacy of individual records.

A Practical Example: Sales Data Analysis

Let’s explore how to use Snowflake Aggregation Policies in a practical scenario.

⛁ Sample Data Model: salesdb-data-model

Setup the SalesDB sample data model and proceed with the rest of the exercises. This model provides a robust foundation for tracking customers, buyers, clients, and sales opportunities.

-- Assign the database schema
use schema salesdb.custs;

Let’s delve into how this Snowflake feature can supercharge your data workflow:

Protecting Sales Transactions

Let’s consider your sales data model (SalesDB.Custs). Suppose you want to prevent users from accessing individual customer transactions in the Opportunities table. You decide that results should be aggregated to a minimum of 5 opportunities.

1. Creating the Policy:

CREATE OR REPLACE AGGREGATION POLICY custs.min_opportunities
AS
() RETURNS AGGREGATION_CONSTRAINT -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 3);

2. Assigning the Policy:

ALTER TABLE custs.Opportunities SET AGGREGATION POLICY min_opportunities;

3. Testing the Policy:

-- This will fail (less than 3 rows)
SELECT * FROM custs.Opportunities WHERE OPPORTUNITYID = 1003;

-- This will succeed (aggregates to 3 or more rows)
SELECT LeadSource, COUNT(*) AS NumOpportunities
FROM custs.Opportunities
GROUP BY LeadSource;

This will run the test to ensure the aggregation is enforced, and in the first case, it fails due to aggregation policy enforcement.

Advanced Use Cases

Multiple Conditions: You can define more complex policies based on multiple criteria. For example, a policy could require a minimum of 3 rows and an aggregated amount exceeding $50,000.

Here’s an example query that aggregates six records from the Opportunities table, complying with the min_opportunities policy:

SELECT
LeadSource,
COUNT(*) AS NumberOfOpportunities,
AVG(Amount) AS AverageOpportunityValue
FROM custs.Opportunities
GROUP BY LeadSource;

This will run the aggregation of the data based on LeadSource. The result will only include the rows where the count of rows is greater than or equal to 3, based on the defined policy.

Important Considerations and Limitations

  • Granularity: Choose the minimum group size carefully, balancing data utility with privacy protection.
  • Not Foolproof: While aggregation policies are effective, determined adversaries might still deduce information with targeted queries.
  • Not for Entity-Level Protection: Policies protect individual records, not the entire entity. If multiple records exist for the same entity, some information might still be revealed.
  • Limited Query Types: Certain query constructs like PIVOT, recursive CTEs, and window functions are not allowed.
  • Joins and Set Operators: Carefully consider the impact of aggregation policies when using joins and set operators like UNION ALL.

Removing Aggregation Policies

ALTER TABLE custs.Opportunities UNSET AGGREGATION POLICY;

Discovering Aggregation Policies

-- Show aggregation policies in your account
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.AGGREGATION_POLICIES ORDER BY POLICY_NAME;

Conclusion

Snowflake Aggregation Policies are a powerful tool for maintaining data privacy while still enabling insightful analysis. By defining appropriate aggregation rules and incorporating them into your data governance strategy, you can confidently share data while safeguarding sensitive information. This enhances the trust your customers and partners have in your data governance practices.

For Further Information:

Refer to Snowflake’s official documentation for detailed information and advanced usage scenarios: https://docs.snowflake.com/en/user-guide/aggregation-policies

Originally published at https://tutorials.demohub.dev.

--

--

Fru
DemoHub Tutorials

Technologists | Leader | Educator. I transform tech jargon, complex concepts into plain English and ignite curiosity. Disclaimer: All views are my own.