Snowflake External Network Access

This article is part of a series titled “Weekly Pain Points”, in which I discuss challenges encountered in my job and the solutions I’ve found. These are not necessarily the only or the best solutions, but rather an opportunity to share the fruits of my reflections. I welcome other ideas, solutions, or comments to further enrich the analysis.

Generated with DALL-E

This week, my “pain point” doesn’t directly relate to a work task but rather to an external topic I decided to explore in my spare time. Indeed, I recently came across a LinkedIn post by Mr. Eric Poilvet, Head of GSI Sales Engineering at Snowflake, discussing the topic of External Network Access that allows direct access to external APIs from Snowflake.

This post immediately caught my attention as it’s a very interesting feature that I needed at some point but had to find another solution to resolve my issue.

In his post and the comments, Mr. Eric Poilvet proposed an interesting use case involving using external access to develop Data Apps with the aim of facilitating the retrieval and querying of data from the data.gouv.fr platform on Snowflake.

So, I decided to play along and execute this use case, initially focusing on the following challenge: how to use Snowflake’s external access to query external APIs (e.g., data.gouv.fr)?

ℹ️ For information

The data.gouv.fr platform is an initiative by the French government aimed at making public data accessible to everyone. It serves as a central portal where various government organizations publish datasets in areas such as the economy, health, environment, etc. The platform’s data is accessible in different ways, either through direct download (csv, excel) or via an API, which is the option that interests me here.

External Network Access (Accès au réseau externe)

Before delving into the topic of External Access, it’s relevant to highlight the added value of this functionality. Indeed, it’s simply an API integration, a task one could accomplish through a traditional Python script. However, the true difference lies in the ability to perform this operation directly from Snowflake, eliminating the need for an external server. No more searching for alternative solutions to deploy and run our program. The program will be accessible and run directly on Snowflake, leveraging the computational power and all other features offered by this platform.

Implementation of External Network Access

The setup of an external network access is done as follows:

  • Create a network rule to represent the external network location
  • Create, if necessary, a secret to store connection credentials for the external network (API_KEY, OAUTH_TOKEN, etc.)
  • Create an external access integration, aggregating the secret and network rule
  • Create a UDF (User-Defined Function) or procedure with the EXTERNAL_ACCESS_INTEGRATIONS parameter to query the external API

Use-case: Integration of the API Adresse and Recherche d’entreprises from Data Gouv

In this example, we will interface Snowflake with two APIs (Address and Company Search) available on the data.gouv.fr platform and develop two simple functions to allow our users to query these services. For our example, we will need a database named “DATAGOUV” in which all our objects will be created.

api.gouv.fr

Step 1: Creation of a database “DATAGOUV”

create database demo comment = 'external access labs';

Step 2: Implementation of the network rule towards the datagouv portal

The HOST_PORT type indicates that the network rule will allow outgoing network traffic to the domains specified in the VALUE_LIST parameter. When TYPE = HOST_PORT, the MODE must be set to EGRESS.

create or replace network rule datagouv_apis_network_rule
mode = egress
type = host_port
value_list = ('www.data.gouv.fr', 'api-adresse.data.gouv.fr', 'recherche-entreprises.api.gouv.fr');

Step 3: Get an API KEY

For our example, we don’t need an API key as the Address and Company Search APIs are freely accessible. In case you need one, you will generally need to follow the procedure indicated in the API documentation to authenticate with it.

Step 4: Creation of a secret with the API key

Similar to step 3, creating a secret is not essential for our example. However, in the following steps, we will proceed as if we have created one to demonstrate code versions that require this parameter.

create or replace secret datagouv_api_key
type = generic_string
secret_string = 'INSERT YOUR API_KEY';

Step 5: Creation of external access

This step involves grouping the allowed network rules and authorized secrets. To create external access, a role with the CREATE INTEGRATION privilege is required.

use role accountadmin;
create or replace external access integration datagouv_apis_access_integration
allowed_network_rules = (datagouv_apis_network_rule)
allowed_authentication_secrets = (datagouv_api_key)
enabled = true;

Step 6: Creation of a user-defined function

Once the network rule and external access are defined, they can be used in functions or procedures to access authorized domains. To do this, you need to add the EXTERNAL_ACCESS_INTEGRATIONS parameter, which can contain one or more external access integrations, and include the SECRETS parameter when it is necessary to pass a token in the API request.

# geocadage d'une adresse
create or replace function search_address(query string)
returns variant
language python
runtime_version = 3.8
handler = 'main'
external_access_integrations = (datagouv_apis_access_integration)
secrets = ('cred' = datagouv_api_key)
packages = ('requests')
as
$$
import requests
import json
req_session = requests.Session()
def main(query):

BASE_URL = 'https://api-adresse.data.gouv.fr/search'

params = {
'q': query,
'limit': 5
}
response = req_session.get(BASE_URL, params=params)
if response.status_code == 200:
j = response.json()
if len(j.get('features')) > 0:
first_result = j.get('features')[0]
lon, lat = first_result.get('geometry').get('coordinates')
first_result_all_infos = { **first_result.get('properties'), **{"lon": lon, "lat": lat}}
return first_result_all_infos
else:
return 'No result'
else:
return 'Error'
$$;

The function above allows querying the “/search/” entry point of the Address API and retrieving information about the address passed as a parameter.

results from search_address function

In this second example, the function “recherche_entreprises” allows querying the Company Search API, taking parameters such as the company name, address, executives, or elected officials.

create or replace function recherche_entreprises(query string)
returns variant
language python
runtime_version = 3.8
handler = 'main'
external_access_integrations = (datagouv_apis_access_integration)
secrets = ('cred' = datagouv_api_key)
packages = ('requests')
as
$$
import requests
req_session = requests.Session()
def main(query):

BASE_URL = 'https://recherche-entreprises.api.gouv.fr/search'
params = {
'q': query
}
response = req_session.get(BASE_URL, params=params)
if response.status_code == 200:
return response.json()
else:
return 'Error'
$$;

Therefore, we can use this function and retrieve all available information about a given company.

Voilà! The two functions are now live and available to everyone simply through a SELECT statement.

Conclusion

This example case turned out to be very interesting and useful, as it allowed me to understand and implement the various objects needed for integrating Snowflake with external APIs. This functionality provides a centralized data infrastructure around Snowflake, eliminating the need to move data.

Moreover, the implemented functions above can be made available to all users, allowing them to query these services without worrying about the implementation details.

However, I observed some limitations related to the fact that it’s currently not possible to import all Python libraries natively into Snowflake (only those available on Anaconda). Therefore, integrating third-party APIs with libraries outside Anaconda requires a bit more effort.

Resources

The opinions expressed in this article are strictly my own and do not engage anyone else.

--

--

Thierno Diallo
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Tech Lead Data 🎓. Snowflake Enthusiast 💙 & Certified 🥇. ETL/ELT Expert ⚙️. Data Enthusiast 📈