How Snowflake’s IT team uses external functions

As a Data Engineer on Snowflake’s IT team, it’s my team’s job to build and monitor the ingest pipelines from various external sources and write back the enriched data from Snowflake to source systems. One Snowflake feature that is particularly useful for our work is external functions.

The external functions feature in Snowflake enables the ability to call code that is executed outside Snowflake (remote service). An external function is like a regular UDF, but it calls cloud serverless compute services such as AWS Lambda. It can pass parameters to the remote service and accepts the data returned by remote service, which can be consumed in Snowflake.

External functions open new ways to implement various workflows for Ingestion, Reverse ETL, and Alerting in Snowflake.

Data Ingestion

Below is the high-level diagram for API Data Ingestion Architecture using external functions.

We use Tasks, Stored Procedures, and External Functions to implement External API Data ingestions. Let’s walk through how we use each of these components:

  1. Tasks to schedule the job, which triggers a Stored Procedure call if there is a pagination/ control flow logic. If not, the task will directly call the external function in a SQL Query.
  2. Stored Procedure to implement pagination/ control flow logic and call external function, consume the JSON response from external function, flatten it and ingest into target tables.
  3. External function takes API parameters like Endpoint URL, Post Data, Query Parameters and passes them to the remote service, which makes the actual API call to retrieve the data and returns the response to Snowflake.
  4. In some cases, API response is not directly consumable in Snow SQL, for example, CSV files. The remote service can upload the files to S3, and Snowpipe will take it from there.

Slack Ingestion Use case

Using external functions, the Snowflake IT team ingests Slack user data. This data allows us to send personal notifications to employees.

By using external functions to ingest this data we minimize the number of moving pieces and don’t have to manage any infrastructure. Additionally, we don’t have any additional costs from or dependency on external orchestrators for building the ingestion pipelines.

Let’s take a look at how we do this.

External Function

Create an external function with required parameters in the headers section:

CREATE OR REPLACE SECURE EXTERNAL FUNCTION slack_get_api_call(url string) RETURNS VARIANT 
RETURNS NULL ON NULL INPUT
VOLATILE
MAX_BATCH_ROWS=1
API_INTEGRATION=sfc_datasci_api_gateway
HEADERS=
(
‘api-url’=’{0}’
‘api-method’=’get’
‘secrets’= ‘Slack_API_Key’
)
as ‘https://*****.execute-api.us-west-2.amazonaws.com/it-data-ingest-lambda-stage/';

Retrieve API Response In SQL

Calling the external function in the SQL query to visualize the JSON response:

select “VALUE” output_json from (
select slack_get_api_call(‘https://slack.com/api/users.list?token={{Slack_API_Key}}')[0]:members val) s, lateral flatten(input => val)

Stored Procedure

We need to paginate through the API response to fetch the complete dataset from the API. We use stored procedures to build the pagination logic and flatten the data from raw JSON to ingest into the target table:

CREATE OR REPLACE PROCEDURE load_slack_users()
RETURNS VARIANT
LANGUAGE JAVASCRIPT
AS
$$
var next_url = ‘https://slack.com/api/users.list?token={{Slack_API_Key}}';
var insert_snippet;
var del_cmd = “truncate table slack_users”;
var del_stmt = snowflake.createStatement(
{
sqlText: del_cmd
},
);
del_stmt.execute();
while (next_url!=null)
{
var cmd = “select return[0]:members members, return[0]:response_metadata:next_cursor from(select slack_get_api_call(?) return) s”;
var stmt = snowflake.createStatement({sqlText: cmd,binds:[next_url]},);var result1 = stmt.execute();result1.next();var next_cursor = result1.getColumnValue(2);
insert_snippet = result1.getColumnValue(1);
var snippet = JSON.stringify(insert_snippet);
var stmt_insert = snowflake.createStatement({
sqlText: `insert into slack_users
select $1:id::string slack_id, $1:name::string slack_user_name,$1:profile:display_name::string display_name, $1:profile:email::string email, $1:deleted::boolean deleted_user, $1:is_admin::boolean is_admin, $1:is_app_user::boolean is_app_user, $1:is_bot::boolean is_bot, $1:is_owner::boolean is_owner, $1:is_primary_owner::boolean is_primary_owner, $1:is_restricted::boolean is_restricted, $1:is_ultra_restricted::boolean is_ultra_restricted, $1:profile:display_name_normalized::string display_name_normalized, $1:profile:first_name::string first_name, $1:profile:last_name::string last_name, $1:profile:real_name::string real_name, $1:profile:real_name_normalized::string real_name_normalized, $1:profile:phone::string phone, $1:profile:team::string team, $1:profile:title::string title, $1:team_id::string team_id, $1:tz::string timezone,$1:enterprise_user:teams::variant enterprise_user_teams
from (select VALUE output_json from (select parse_json(?) val) s, lateral flatten( input => val )) S`,binds:[snippet]});stmt_insert.execute();next_url = (next_cursor == ‘’ ? null: ‘https://slack.com/api/users.list?token={{Slack_API_Key}}&cursor='+next_cursor)}
$$;

Task

Schedule a task to refresh the slack_users table:

create or replace task slack_user_refresh
WAREHOUSE = ‘SNOWHOUSE’
SCHEDULE = ‘USING CRON 0 5 * * * America/Los_Angeles’
as
call load_slack_users();

In the above example, the scheduled task runs every day at 5 AM and reloads the Slack_users table with the latest data from the Slack API.

Alerting

Our team also leverages external functions for alerting. For instance, we create incidents for ServiceNow when there is a failure in our data pipeline.

Alerting with external functions is very similar to the above example. Let’s walk through what creating a ServiceNow alert looks like.

External Function

CREATE OR REPLACE SECURE EXTERNAL FUNCTION it_servicenow_send(url string,post_data string, auth_headers string, secrets string)
RETURNS VARIANT
VOLATILE
MAX_BATCH_ROWS=1
API_INTEGRATION=sfc_datasci_api_gateway
HEADERS=(
‘api-url’=’{0}’
‘api-method’=’POST’
‘headers’=’{2}’
‘api-post-data’ = ‘{1}’
‘secrets’ = ‘{3}’
‘api-post-data-string’ =’true’
)
as ‘https://*****.execute-api.us-west-2.amazonaws.com/it-data-ingest-lambda-stage/';

Below is an example call to the external function creating a problem task in ServiceNow:

INSERT INTO it_snowdesk_problem_alert_logSELECT CURRENT_DATE() alert_date, it_servicenow_send(‘https://****.snowflake.com/api/now/table/u_snowflake_problem',To_varchar(Object_construct(‘u_category’, u_category,‘u_subcategory’,u_subcategory,
‘u_problem_statement’, u_problem_statement,
‘u_impact’, u_impact, ‘u_urgency’, u_urgency,
‘u_assignment_group’, u_assignment_group,
‘u_configuration_item’, cmdb_ci,
‘u_description’, u_description, ‘u_alert_type’,
‘SNOWDESK.01’,’u_contact_type’,’Integration’
)),
Concat(‘{“Authorization”: “Basic {{‘, ‘svc_snowdesk_sec’ , ‘}}”, “Content-Type”:”application/json”}’) from snowdesk_problem_alert_source

Reverse ETL with External Functions

Our team uses Snowflake as the hub for all relevant data. Within Snowflake we have a unified view of all our application data, allowing our team to more effectively manage user access and support our employees. After we’ve enriched data within Snowflake, we often need to write data back to SaaS applications, ranging from Okta to ServiceNow. This process is called reverse ETL.

We use external functions to write back the data directly from Snowflake to relevant applications.

We use stored procedures to execute the reverse ETL logic and a task to schedule the job. We’ll be diving more into what this looks like in a future blog post. Stay tuned!

Our team uses external functions for a range of use cases. They help minimize the infrastructure we’re managing and avoid additional costs of building ingestion pipelines. We’d love to hear how you’re using Snowflake’s external functions. Let us know in the comments.

Read more:

--

--