Enhancing Your Data: Historical Weather Data Using Snowflake External Network Access

External Network Access is a powerful Snowflake feature that opens up a whole world of opportunities for enhancing your data. External Network Access allows us to easily and securely call apis outside Snowflake, without having to host the code that calls the api on another cloud service like AWS.

External Network Access offers a compelling alternative to External Functions, which have been available since June of 2020. External Functions allow us to call, for example, and AWS Lambda function. The possibilities External Functions are pretty limitless, but using them requires the effort and overhead of establishing the proxy service, and of course the hosting the remote service itself. On the other hand External Network Access features easy, straightforward set up in situations where you already have an API you want to call — either one of your own, a public one, or one you subscribe to. In this exercise we’ll use External Network Access to make use of a free weather API to enhance fictional sales data.

Note: External Network Access is currently available in paid accounts only, and not available in Azure Gov Region.

Imagine we have a vendor of a consumer product that might be sensitive to weather conditions — for example, ice cream (or umbrellas!) Our end goal is take a set of sales data and correlate the sales with weather conditions. We have:

  • Records representing each sale
  • Each sale has a date and a store location
  • For the sake of convenience, I have geocoded the store locations ahead of time

To find the weather on the day of each sale, we’ll use a free API available from Visual Crossing. There are plenty of weather APIs available, but Visual Crossing provides a free API that includes historical weather records. Since our sales are in the past, historical data is what we are after. If you want to follow along with your own data, grab a key from their free tier, which allows 1000 calls a day, and includes 50 years of historical data.

Here are the steps to establish a create a function in Snowflake that will query the API and return a summary of the weather for a specific location on a specific day.

Step 1: Establish A Network Rule

First, we need to establish a rule that lets us access a website outside Snowflake:

CREATE OR REPLACE NETWORK RULE visual_crossing_weather_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('weather.visualcrossing.com');

This is going to let us access ‘weather.visualcrossing.com’, which is where our API is located

Step 2: Store The Access Key

The Visual Crossing weather API requires, like many web APIs, a “key” — a unique token linked to your account. Strictly speaking you can hard-code the key into the API call, but this is a real security no-no, for obvious reasons. Snowflake has a robust security architecture, so let’s take advantage of it:

CREATE OR REPLACE SECRET visual_crossing_api_key
TYPE = GENERIC_STRING
SECRET_STRING = 'YOUR_ACCESS_KEY_HERE'

Secrets” are an interesting Snowflake feature that allow us to essentially encode a piece of information like a password, token, or oauth key, after which it can only be retrieved programmatically. This helps secure our code. There are several types of secrets that can be stored, and different ways of consuming them. The basics of Secrets functionality is described here. Retrieving a Secret using Python is described here.

Step 3: Create An Access Integration

The next step is to set up an object that allows access to the Network rule established in step 1:

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION visual_crossing_weather_api_integration
ALLOWED_NETWORK_RULES = (visual_crossing_weather_rule)
ALLOWED_AUTHENTICATION_SECRETS = (visual_crossing_api_key)
ENABLED = TRUE;

Note the use of the two objects we have created so far:

  • visual_crossing_weather_rule, which is our network rule that allows access to the Visual Crossing website
  • Our “secret”, called visual_crossing_api_key

Step 4: Create Our Function

Finally, we create a function that will take 3 parameters:

  • Latitude of our store (from the store geocoding)
  • Longitude of our store (also from the store geocoding)
  • The date we are trying to find weather conditions for, which is also the date of our sales
CREATE OR REPLACE FUNCTION visual_crossing_historical_weather(lat STRING, lon STRING, weather_date STRING)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'get_weather'
EXTERNAL_ACCESS_INTEGRATIONS = (visual_crossing_weather_api_integration)
SECRETS = ('cred' = visual_crossing_api_key )
PACKAGES = ('snowflake-snowpark-python','requests')
AS
$$
import _snowflake
import requests
import json
session = requests.Session()
def get_weather(latitude, longitude, date):
token = _snowflake.get_generic_secret_string('cred')
url = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/" + latitude + "," + longitude + "/" + date + "/" + date + "?unitGroup=metric&key=" + token + "&contentType=json"
response = session.get(url)
return response.json()['days'][0]['conditions']
$$;

A few things to note here:

  • Since we are using a Secret, we need to declare the use of it using the SECRETS keyword, and to associate a variable with it (in this case, ‘cred’)
  • To retrieve the api token from the Secret, we use the get_generic_secret_string function, passing it the name we have given it (‘cred’). This will retrieve the value we passed to SECRET_STRING when we created the Secret in Step 2
  • Every API is going to have a different signature, in this case I am building a url using the parameters passed to the function, and then using a simple “get” to return a json response. From this I am accessing the ‘days’ array in the json response, getting the data for the first ‘day’ in the array (index 0) and retrieving the ‘conditions’ attribute. A good description of the Visual Crossing API is here.

Step 5: Query Our Data

Now we are ready to use the function we have created to get a summary of the weather for each day in our sales data.

Here are my 5 stores:

For each store, I have sales data for each day of the month of March, 2023:

Let’s update column WEATHER_SUMMARY using our new function visual_crossing_historical_weather to show what the weather was like in each location for each day:

UPDATE SALES
SET WEATHER_SUMMARY = visual_crossing_historical_weather(STORES.LATITUDE,STORES.LONGITUDE,SALES.SALES_DATE)
FROM STORES
WHERE SALES.STORE_ID=STORES.STORE_ID

Whatever they are selling at Store 1, it seems inversely sensitive to clear weather!

And that’s it! In a few easy steps we can connect our data in Snowflake to virtually any API we want, and enhance our data with functions we can call as part of a regular SELECT statement. This is a great feature, and as always Snowflake makes it easy to implement.

--

--