Governance through “aggregation policy” in Snowflake

In this blog, I would be discussing one of the new features around data governance i.e., “Aggregation policy”. Data governance in Snowflake is crucial for ensuring that data within the platform is accurate, secure, consistent, and accessible to authorized users. Effective data governance provides a framework for managing data assets, aligning them with business goals, and complying with regulatory requirements. Hence these feature additions helps in amplifying this platform’s capability on Data governance.

What is an “aggregation policy” in Snowflake ?

A schema-level feature is called an “aggregate policy” which regulates the kinds of queries that can access data from a view or table.

When a table has an aggregation policy in place, queries against that table are unable to deliver information from a single record because the queries must aggregate data into groups of a minimum size in order to return results.

Aggregation-constrained refers to a table or view that has an assigned aggregation policy.

Why this feature is very important ?

This feature could be extremely useful in enterprise. Aggregation policies are beneficial when you want to answer aggregate-level questions without directly accessing individual-level personal information.

Consider a table that contains employee email addresses. If a data engineer or analyst is asked, “How many people have email addresses that has email address which has a domain of “gmail” they would not be able to answer this question without access to individual rows. This is because personal information might be excluded row by row due to row access policies. But this question can still become relevant for the analysts as they want to get aggregated information without having to access individual rows. This is exactly where “aggregation policies” come into play.

Let us see them in action now:

In the demonstration, I would be covering 2 ways in which the aggregation policies can be used.

  1. Governance through same account.
  2. Governance through shares.

Test data preparation:

Let us create some synthetic data for doing this demonstration. If we see below then I am creating an employee department table and the idea is to group 4 employee against each department and leave some with < 4 to also test the negative scenarios.

USE ROLE SYSADMIN;
CREATE DATABASE DEMO_DTB;
CREATE SCHEMA DEMO_DTS;
CREATE OR REPLACE WAREHOUSE XSMALL_DEMO_01 WAREHOUSE_SIZE = 'XSMALL';


CREATE OR REPLACE TABLE employee_table_01 (
employee_id DECIMAL,
firstname VARCHAR,
lastname VARCHAR,
salary DECIMAL,
departmentid DECIMAL,
phone_number VARCHAR
);

INSERT INTO
employee_table_01 (
employee_id,
firstname,
lastname,
salary,
departmentid,
phone_number
)
VALUES
(1, 'John', 'Doe', 50000, 101, '123-456-7890'),
(2, 'Jane', 'Doe', 55000, 101, '234-567-8901'),
(3, 'Kate', 'Kyle', 55000, 101, '234-567-8902'),
(4, 'Prince', 'Kyle', 55000, 101, '234-560-8902'),
(5, 'Jim', 'Smith', 60000, 102, '345-678-9012'),
(6, 'Jenny', 'Smith', 65000, 102, '456-789-0123'),
(7, 'Jenny', 'Roy', 65000, 102, '456-789-0121'),
(8, 'Kyle', 'Smith', 65000, 102, '456-789-0193'),
(9, 'Patt', 'Roy', 65000, 103, '456-789-0126'),
(10, 'Joan', 'Johnson', 70000, 103, '567-890-1234'),
(11, 'Joe', 'Johnson', 75000, 104, '678-901-2345');

SELECT * FROM employee_table_01 ORDER BY departmentid;

Use Case 1 : Governance within same account through aggregation policy

Below is the method and the code snippet on how we can apply the aggregation policy within the same account. In this example both the negative and the positive scenarios are explained.

/* let us now create some custom role "HR_ADMIN" */

USE ROLE USERADMIN;
CREATE ROLE HR_ADMIN;

USE ROLE SYSADMIN;
GRANT ALL ON DATABASE demo_dtb TO ROLE USERADMIN;

GRANT ALL ON DATABASE DEMO_DTB TO ROLE HR_ADMIN;
GRANT ALL ON SCHEMA DEMO_DTS TO ROLE HR_ADMIN;
GRANT ALL ON TABLE DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 TO ROLE HR_ADMIN;
GRANT ALL ON WAREHOUSE XSMALL_DEMO_01 TO ROLE HR_ADMIN;

/* let us apply the aggregation policy to the tables in scope for this POC */
USE ROLE SYSADMIN;
USE DATABASE DEMO_DTB;
USE SCHEMA DEMO_DTS;

USE ROLE SECURITYADMIN;

CREATE OR REPLACE AGGREGATION POLICY emp_agg_policy
AS () RETURNS AGGREGATION_CONSTRAINT ->
CASE
WHEN CURRENT_ROLE() = 'HR_ADMIN'
THEN NO_AGGREGATION_CONSTRAINT()
ELSE AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 4)
END;



/* Setting up new the table by applying aggregation policies */

SELECT * FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01;
ALTER TABLE employee_table_01 SET aggregation POLICY emp_agg_policy;

USE ROLE HR_ADMIN;
USE ROLE SECURITYADMIN;
GRANT ROLE HR_ADMIN TO USER SOMEN2024;

/* let us test the data sets now */

--This below query would yield output record by record --
USE ROLE HR_ADMIN;
SELECT * FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 order by DEPARTMENTID;

/* Let us see the data by using the role of ACCOUNTADMIN */
USE ROLE ACCOUNTADMIN;

--Failed Queries(by using the role other than "HR_ADMIN" role) --
SELECT * FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01; --> This would FAIL.
SELECT MAX(SALARY) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 WHERE DEPARTMENTID='101'; --> This would FAIL.
SELECT MIN(SALARY) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 WHERE DEPARTMENTID='101'; --> This would FAIL.

--- Success Avg & Sum of queries(these would pass) ---
SELECT AVG(SALARY) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 WHERE DEPARTMENTID='101';
SELECT SUM(SALARY) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 WHERE DEPARTMENTID='101';
SELECT COUNT(DISTINCT DEPARTMENTID) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01;
SELECT COUNT(DISTINCT DEPARTMENTID) FROM DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01;

Aggregation policy and ACCOUNTADMIN roles.

Use Case 2: Governance through shares :

Over here we demonstrate how through shares the consumer account can get all quality & aggregated insights without having direct access to individual rows.

--Provider account steps:
USE ROLE accountadmin;
CREATE SHARE demo_share_01;

GRANT USAGE ON DATABASE DEMO_DTB TO SHARE demo_share_01;
GRANT USAGE ON SCHEMA DEMO_DTS TO SHARE demo_share_01;
GRANT SELECT ON TABLE DEMO_DTB.DEMO_DTS.EMPLOYEE_TABLE_01 TO SHARE demo_share_01;

ALTER SHARE demo_share_01 ADD ACCOUNTS= GF52976; ---> where GF52976 is the new account that got created.



--Consumer account steps:
select current_account();
show shares;
describe share DEMO_SHARE_01;
DESC SHARE KZPDHXU.BE25061.DEMO_SHARE_01;

CREATE DATABASE DEMO_CONSUMER_DB FROM SHARE KZPDHXU.BE25061.DEMO_SHARE_01;

Queries from Consumer account(this is explained in the snapshot)
Consumer account & aggregation policy behavior

Some of the boundary cases:

Over here through the aggregation policy we would see how various kind of aggregation functions work.

a. MAX function:

This do not work. Below is the snippet:

With MAX

b. MIN function:

This do not work. Below is the snippet:

With MIN

c. AVG, with selected column:

This is an interesting use case, as we see the below snapshot, we have provided the aggregation policy to be grouped as a record of 4, hence for the ones where the grouping is not 4, over there it returns as NULL.

Non-Group members value

Benefits & Key data points w.r.t Aggregation policies:

Below are the key benefits/data points w.r.t “aggregation policies” in Snowflake

Some of the benefits & Key points

SUMMARY:

I have shared some of the key features around governance and aggregation policy while this has a potential to unlock a lot of use cases, we also need to keep watching Snowflake for continuously enhancing their product capabilities.

Please keep reading my blogs it is only going to encourage me in posting more such content. You can find me on LinkedIn by clicking here and on Medium here. Happy Learning :)

Awarded consecutively as “Data Superhero by Snowflake for year 2024 & 2023”. Links: here

Disclaimer: The views expressed here are mine alone and do not necessarily reflect the view of my current, former, or future employers.

--

--

Somen Swain
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Snowflake Data Superhero 2024 & 2023 | 4XSnowpro Certified | AWS Solution Architect Associate | Cloud Computing| Principal-Data Engineering at LTIMindtree