StarSnow: HTTP Client for Snowflake SQL
Generic Snowflake functions to interact with web APIs directly from SQL statements.
Snowflake is an extremely SQL-friendly database: you can ingest, transform, and access your structured and semi-structured data directly from your SQL code. However, as a cloud-only data platform, it has some fundamental restrictions: all insecure operations like accessing local files or establishing network connections are unsupported from UDFs and stored procedures. There is no build function for calling external APIs or downloading web data from pure SQL. The only exceptions from this constraint are external functions, that allow calling AWS lambda functions, but nothing else.
This is why I built StarSnow, the first general-purpose HTTP client for Snowflake SQL.
Introducing StarSnow
StarSnow allows you to download external web content and invoke complex REST APIs from a simple interface. The package contains two functions, a simplified one for basic get
requests and a regular one that allows more tailored requests to be constructed.
Simple HTTP GET
Let’s take a look at a basic use case. We want to check the Snowflake platform’s status information from our ETL processes. To do that, I need to get data from Snowflake’s Statuspage.
First, let’s try it with the simplified get function, called STARSNOW_REQUEST_GET
. The first — and only — parameter is the URL: this is where we get the data, and on successful execution, it will return with the contents of that page as string/varchar.
tfoldi@TEST_DB.PUBLIC> select STARSNOW_REQUEST_GET('https://status.snowflake.com/api/v2/status.json') as snowflake_status;+--------------------------------+
| SNOWFLAKE_STATUS
|--------------------------------|{"page":{"id":"94s7z8vpy1n8","name":"Snowflake","time_zone":"America/Los_Angeles","updated_at":"2020-12-23T17:44:16.380-08:00","url":"https://status.snowflake.com"},"status":{"description":"All Systems Operational","indicator":"none"}}
1 Row(s) produced. Time Elapsed: 3.349s
This was easy, isn’t it?
But what if I want only the status text itself? This is when I need to switch to the regular function, called STARSNOW_REQUEST
. Unlike the previous STARSNOW_REQUEST_GET
function, this will return with a variant
type, thus we can start accessing its elements directly from the select
statement:
tfoldi@TEST_DB.PUBLIC> select STARSNOW_REQUEST('https://status.snowflake.com/api/v2/status.json', NULL):data:status:description as snowflake_status;+---------------------------+ | SNOWFLAKE_STATUS |
|---------------------------|
| "All Systems Operational" |
+---------------------------+
1 Row(s) produced. Time Elapsed: 1.739s
This was pretty neat, too. Since the Statuspage API server responded with a JSON object, our function automatically parsed it and made part of the returned variant
object. Then, accessing some part of the response was as easy as adding :data:status:description
to the SELECT
’s field definition to get only that part.
Handling parameters — overcoming Snowflake’s function parameter limitations
And now a little bit of an intermezzo. Life is not full of simple get requests, so we need a fancy way to pass different options and parameters to our function. Now the hard part is, Snowflake does not support optional parameters or named parameter passing. While we can overload functions, that will end up with dozen of functions with all the different permutations, which will make everything pretty inconvenient. So what’s the best practice in Snowflake? The answer is object
parameters.
Let’s imagine that we want to pass named parameters to our functions such as method
, parameters
, body
and so on. In some other databases, we can use named parameters but in Snowflake, objects
are our best option. A function created with:
create function function_with_many_options(params object) returns float as [...]
can be invoked as:
select function_with_many_options(
object_construct( 'param1', 'foo'
, 'param2', 3.14
, 'param10', current_timestamp(2)));
While we have only one parameter for our function, invokers can pass any number of options inside that single parameter as key-value pairs.
Thus, the second parameter of our STARSNOW_REQUEST
is an object
, that maps perfectly to an Axios request configuration object, allowing us to define dynamically request method, query parameters, request body, proxy server, and so on.
Ingesting Snowflake stock prices directly from a REST API
Let’s look at another use case, where we want to ingest data directly from a REST API to a Snowflake Table. Previously, we had to download the data from the API, stage it in Snowflake, then copy
it to the final table. For large data volumes, it makes sense, but it is a fairly big overhead for small data.
For getting the stock prices, I am using Alpha Vantage API. It is completely free, you can get an API key here.
Now we can start building our query. The URL we need to invoke is:
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY_ADJUSTED&symbol=SNOW&apikey=apikey
We can start constructing
our options parameter and add the necessary flattening code to transform the returned JSON
/variant
to column, then persist to a real table.
Executed directly from SnowSight, we can immediately see the results:
Wow, we just invoked, transformed and ingested REST API data to Snowflake with a single SQL statement!
Snowflake and Slack Integration
The last example is about HTTP POST requests with a lively use case: sending a Slack message from Snowflake. We use this feature for exception handling. For example, if some of the data pipeline fails, Snowflake can inform us on one of our ChatOps channels.
To start, create a new Webhook Integration, following the steps here.
Then execute the following statement in your Snowflake instance:
select STARSNOW_REQUEST('https://hooks.slack.com/<your_webhook_id>',
OBJECT_CONSTRUCT('method', 'post',
'data', OBJECT_CONSTRUCT(
'text', 'hello from snowflake')));
And this is how your friendly snowflake bot says hi to you on slack:
Instead of just saying ‘hi’, you can also send the result of a select
statement.
Deploying StarSnow
For deployment, I used Serverless along with my Serverless Snowflake plugin which automates every step of the installation.
First, you need to install Serverless:
npm install -g serverless
Then download and compile StarSnow sources:
git clone https://github.com/starschema/starsnow_request
npm install
Set up the following environment variables:
SNOWFLAKE_ACCOUNT
SNOWFLAKE_USERNAME
SNOWFLAKE_DATABASE
SNOWFLAKE_PASSWORD
SNOWFLAKE_ROLE
(make sure that the role hasCREATE INTEGRATION
privilege)SNOWFLAKE_SCHEMA
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
And finally, start deploying the lambda functions and external functions with:
serverless deploy
At this point, you should have everything set up in your environment.
Conclusion
While Snowflake doesn’t have an HTTP Client built-in, with StarSnow you can still interact with REST APIs. Web scraping, ingesting results from external web services or sending Teams and Slack messages are now possible from pure Snowflake SQL. If you need any help to set it up, just reach out to me on LinkedIn or Twitter.
Feel free to use or contribute to StarSnow at its GitHub project page: https://github.com/starschema/starsnow_request
Update: StarSnow is now fully supported in Snowpark, making it easy to use with Python too.