Snowflake Data Metric Functions and Alerts

Athavale Mandar
4 min readApr 27, 2024

--

Photo by Agence Olloweb on Unsplash

In the journey of continuous innovation, Snowflake recently introduced a cool feature for performing Data Quality Checks — Data Metric Functions. These functions help customers to measure various data quality metrics for their own datasets. There are system (Snowflake) defined and customer defined Data Metric Functions . As of now, below system Data Metric Functions are available in public preview.

DATA_METRIC_SCHEDULED_TIME,FRESHNESS,DUPLICATE_COUNT,NULL_COUNT,ROW_COUNT,UNIQUE_COUNT

In this article, we would be creating our own custom data metric function to validate Email IDs for a customer. Later, we will read those results and the anomalies would be sent over Email using Snowflake Alerts & send_email functionalities.

Step 1: Setup access control and setup sample data

In order to create data metric function and then view its evaluation results, you would need below privileges:

  1. EXECUTE DATA METRIC FUNCTION ON ACCOUNT
  2. APPLICATION ROLE SNOWFLAKE.DATA_QUALITY_MONITORING_VIEWER
  3. DATABASE ROLE SNOWFLAKE.DATA_METRIC_USER

Make sure that, you grant these privileges to necessary role before starting on this feature.

Let’s now create a demo table and insert some sample data

use schema sf_demo.public;
use warehouse poc_wh;

CREATE OR REPLACE TABLE customers (
account_number NUMBER(38,0),
city VARCHAR(16777216),
country VARCHAR(16777216),
first_name VARCHAR(16777216),
last_name VARCHAR(16777216),
email VARCHAR(16777216),
phone VARCHAR(16777216)
);



INSERT INTO customers (account_number, city, country, email, first_name, last_name, phone)
VALUES
(1589420, 'Mumbai', 'IN', 'sham@example.com', 'Sham', '.', 1234567890),
(2834123, 'Delhi', 'IN', 'gopal@example.com', 'Gopal', '.', 3641252911),
(4829381, 'Kolkata', 'IN', 'Mandar@example.com', 'Mandar', '.', 3641252912),
(9821802, 'Chennai', 'IN', 'Netra@example.com', 'Netra', '.', 1234567891),
(8028387, 'Pune', 'IN', 'Mrunalexample', 'Mrunal', '.', 1012023030);

You can observe that, Email ID for Mrunal does not contain symbol @

Now this is definitely a data quality issue and it must be reported to relevant authority.

Step 2: Setup custom Data Metric Function (DMF)

Let’s create a custom Data Metric Function that identified incorrect/incomplete Email addresses

CREATE OR REPLACE DATA METRIC FUNCTION
invalid_email_count (ARG_T table(ARG_C1 STRING))
RETURNS NUMBER AS
'SELECT COUNT_IF(FALSE = (
ARG_C1 REGEXP ''^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$''))
FROM ARG_T';

In this code block,

invalid_email_count is the name of the custom data metric function

ARG_T is the table name that would be setup when we attach this data metric function to a table.

ARG_C1 is the name of the column on which this data metric function performs desired data quality checks — again when it is attached to a table

We are using REGEXP to verify the Email ID and its format — it should include @ and selected characters / numbers after .

Below code block shows, how an independent metric function can be attached to a certain table and a column inside. I am highlighting attach because, when created, data metric function is an independent entity. Unless it is set as a property of a table, it does not perform any data quality checks. This also means that, we can create a single metric function and attach/set it to multiple tables that contain similar data. This makes this feature highly scalable and invokes lesser maintenance.

alter table customers set DATA_METRIC_SCHEDULE = '5 minute';

alter table customers add data metric function sf_demo.public.invalid_email_count on (email);

Step 3: Verify DMF setup and observe results of Data Quality Checks

After this setup is complete, It is time to verify that data metric function is set correctly on given table and then view the results of these data quality checks. Use below code block to observe the data quality check results.

-- check DMF status
SELECT * FROM TABLE(INFORMATION_SCHEMA.DATA_METRIC_FUNCTION_REFERENCES(
REF_ENTITY_NAME => 'sf_demo.public.customers',
REF_ENTITY_DOMAIN => 'TABLE'));

-- Check DMF results
SELECT scheduled_time, measurement_time, table_name, metric_name, value
FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
WHERE TRUE
AND METRIC_NAME = 'INVALID_EMAIL_COUNT'
AND TABLE_NAME = 'CUSTOMERS'
AND METRIC_DATABASE = 'SF_DEMO'
order by measurement_time desc;

Below are the results of DMF attached on table customers.

As mentioned earlier, same DMF can also be attached to another table to perform Data Quality checks on Email ID column as below.

CREATE OR REPLACE TABLE vendors (
vendor_id NUMBER(38,0),
city VARCHAR(16777216),
country VARCHAR(16777216),
name VARCHAR(16777216),
email_id VARCHAR(16777216),
phone VARCHAR(16777216)
);

insert into vendors values
(1,'Bhivandi','IN','Super Dryers','superdry@example.com','9133001122'),
(2,'Ghaziabad','IN','Electronics Automation Services','EAS@example.in','91223344'),
(3,'Gorakhpur','IN','Natural Foods Ltd','nfl@example.in','91998877'),
(4,'Bhopal','IN','Electrical Manufactirers','bhopalemsexample.in','91778877');

alter table vendors add data metric function sf_demo.public.invalid_email_count on (email_id);

alter table vendors set DATA_METRIC_SCHEDULE = '5 minute';

Step 4: Setup Snowflake Alerts for Data Quality Anomalies

This is the last block of the puzzle. Let’s inform business users / back office teams about this anomalies in the Email addresses.

Here is the code to setup and resume Snowflake alert

CREATE OR REPLACE ALERT invalid_email_alert
WAREHOUSE = poc_wh
SCHEDULE = '5 minute'
IF( EXISTS (
SELECT
CASE
WHEN ((SELECT max(value)
FROM SNOWFLAKE.LOCAL.DATA_QUALITY_MONITORING_RESULTS
WHERE TRUE
AND METRIC_NAME = 'INVALID_EMAIL_COUNT'
AND TABLE_NAME = 'CUSTOMERS'
AND METRIC_DATABASE = 'SF_DEMO'
and MEASUREMENT_TIME > dateadd('hour',-1,current_timestamp())
order by measurement_time desc) >= 1)
THEN 1
ELSE NULL
END)) THEN
CALL SYSTEM$SEND_EMAIL(
'test',
'mandar@example.in',
'Data Quality Alert : Problems with Email - CAUTION',
'Hello Team,\n It is observed that, there are invalid Email addresses for more than 1 customers and vendors. Kindly verify and correct to avoid incomplete Email communications \n\n Data Governance Officer '
);

alter alert invalid_email_alert resume;

Email has been received after the alert was executed on schedule to highlight data quality issues as below:

Conclusion : It can be concluded that, Data Metric Functions, tightly integrated with tables and Snowflake alerts can help identify and communicate Data Quality issues seamlessly and with minimal maintenence.

--

--