Geocode Addresses in Snowflake with External Access Integration

View from a warming hut in Royal Gorge, CA March 23, 2023

TL;DR:

Snowflake’s new external access feature enables direct communication with third-party APIs. This article shows how to use it to geocode addresses by calling the AWS Location Service directly from a Snowflake UDF.

Introduction

Previously, calling an external API in Snowflake required you to create an external function that called a remote service via a cloud provider’s API gateway. My earlier article stepped through this process for the purpose of geocoding addresses in Snowflake via the AWS Location Service.

The new external access feature drastically simplifies this process, by enabling you to create a UDF that calls the API directly — no need to set up an API gateway and a remote service such as AWS Lambda or Azure Function. All you need is the API endpoint and credentials, and everything else can be set up directly in Snowflake. This article steps through this new method step by step.

Note: The external integration feature is in Private Preview as of this writing and is only available in selected Snowflake regions. If you would like to try it, please request access from your account team.

Step-by-step guide to geocoding in Snowflake with AWS Location Service

1. Setup in AWS

  • Using AWS IAM, create a new user and associate it with a policy that provides access to AWS Location Service
  • Then, using the Security Credentials tab in the user’s IAM screen, create a new Access Key with Access Secret and download them.

2. Setup in Snowflake

Access to an external API in Snowflake is implemented through a new type of integration object called External Access integration, which ties together the network rule containing allowed API endpoints, and authentication credentials that are stored as secrets. The network rule and secrets are schema-level objects and external access integration is an account-level object.

--setup testing database and schema
USE ROLE SYSADMIN;
CREATE OR REPLACE DATABASE TEST;
USE SCHEMA TEST.PUBLIC;

--create network rule and secrets
CREATE OR REPLACE NETWORK RULE TEST.PUBLIC.AWS_LOCATIONSERVICE_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('places.geo.us-west-2.amazonaws.com');

CREATE OR REPLACE SECRET TEST.PUBLIC.AWS_LS_ACCESS_KEY
TYPE = GENERIC_STRING
SECRET_STRING = 'XXXXXXXXXXXXXXX'; --your AWS Access Key

CREATE OR REPLACE SECRET TEST.PUBLIC.AWS_LS_SECRET_KEY
TYPE = GENERIC_STRING
SECRET_STRING = 'XXXXXXXXXXXXXXX'; --your AWS Secret Key

--create external access integration
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION AWS_LS_INTEGRATION
ALLOWED_NETWORK_RULES = (AWS_LOCATIONSERVICE_RULE)
ALLOWED_AUTHENTICATION_SECRETS = (AWS_LS_ACCESS_KEY, AWS_LS_SECRET_KEY)
ENABLED = true;

--grant usage on the integration to sysadmin so that they can leverage it in UDF
GRANT USAGE ON INTEGRATION AWS_LS_INTEGRATION TO ROLE SYSADMIN;

3. Create a User-Defined Function (UDF)

You are now ready to create a UDF that will make an API call and return the result. In this example, I am only returning the latitude and longitude.

--create UDF
USE ROLE SYSADMIN;
USE WAREHOUSE COMPUTE_WH;

CREATE OR REPLACE FUNCTION TEST.PUBLIC.GEOCODE_ADDRESS(ADDRESS STRING)
RETURNS VARIANT
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'get_location'
EXTERNAL_ACCESS_INTEGRATIONS = (AWS_LS_INTEGRATION)
PACKAGES = ('requests','boto3')
SECRETS = ('access_key' = AWS_LS_ACCESS_KEY, 'secret_key' = AWS_LS_SECRET_KEY)
AS
$$
import _snowflake
import requests
import json
import boto3

access_key = _snowflake.get_generic_secret_string('access_key')
secret_key = _snowflake.get_generic_secret_string('secret_key')
region_name = 'us-west-2'

def get_location(ADDRESS):
# Create an AWS Location client using the provided credentials
session = boto3.Session(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name=region_name
)
location_client = session.client('location')

# Perform geocoding request
response = location_client.search_place_index_for_text(
IndexName='explore.place',
Text=ADDRESS
)

# Extract latitude and longitude
try:
latitude = response["Results"][0]["Place"]["Geometry"]["Point"][1]
longitude = response["Results"][0]["Place"]["Geometry"]["Point"][0]

result = {
"latitude": latitude,
"longitude": longitude
}
except (KeyError, IndexError) as e:
# Handle the error when latitude or longitude is missing
result = {}

return result

$$;

4. Test the function

We are now ready to test the function. Let’s get the coordinates for Snowflake’s headquarters in Bozeman, MT.

SELECT TEST.PUBLIC.GEOCODE_ADDRESS('106 E Babcock St, Suite 3A, Bozeman, MT') 
AS COORDINATES;

That’s it!

Thanks to Snowflake’s new external access feature, you are now ready to geocode data with AWS Location Service without the need to set up an API gateway and a remote service outside of Snowflake. A similar approach can be taken for other external APIs. Note that AWS’ limits regarding the number of API calls still apply, so if you have a lot of records to process, you may need to break up your dataset into batches.

Thanks for reading this article. Feel free to reach out with questions and let me know your thoughts.

--

--

Daria Rostovtseva
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Daria is a Sales Engineer at Snowflake helping government customers better serve the public by leveraging data