Pulling Data from an External API into Snowflake with Python

Recently, Snowflake released a new feature in preview, External Network Access, which allows code running on warehouses to access anything on the public internet (which is allowed).

I put together a really simple example which pulls data from jsonplaceholder.typicode.com which offers a completely free json API.

This will use a Python UDTF to pull records from the API and allow it to be queried directly from SQL on a Snowflake warehouse.

First, I added a network rule which allows traffic to the api endpoints.

CREATE OR REPLACE NETWORK RULE api_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('jsonplaceholder.typicode.com');

I then added that Network Rule to a new External Access Integration which can be used by other objects in Snowflake. In this case, I will use it in a Python UDTF.

CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION apis_access_integration
ALLOWED_NETWORK_RULES = (api_network_rule)
ENABLED = true;

And I created a python UDTF which uses Python requests package to pull data from the API. This UDTF specifies the External Access Integration so it can have network access to the destination needed.

Here is an example of the data from the API:

[
{
"userId": 1,
"id": 1,
"title": "delectus aut autem",
"completed": false
},
{
"userId": 1,
"id": 2,
"title": "quis ut nam facilis et officia qui",
"completed": false
},
{
"userId": 1,
"id": 3,
"title": "fugiat veniam minus",
"completed": false
}
]

And the code I wrote to request and parse the data and return the rows in Python:

CREATE OR REPLACE FUNCTION get_todos(completed boolean) returns
table (userId number, id number, title varchar, completed boolean)
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'ApiData'
EXTERNAL_ACCESS_INTEGRATIONS = (apis_access_integration)
PACKAGES = ('requests')
AS
$$
import requests

class ApiData:
def process(self, completed):
data = requests.get("https://jsonplaceholder.typicode.com/todos").json()
for row in data:
if row["completed"] == completed:
yield (row["userId"], row["id"], row["title"], row["completed"])
$$;

I added some filtering into the function so I could query it for completed and non-completed todos as that seems like something that would be useful.

To test this capability, I just ran a simple select on the new udtf.

select * from table(get_todos(FALSE));

The data is now easily queryable in Snowflake and could be joined/stored as needed.

External Network Access is extremely useful to add integration options to external sources. I can’t wait to see what you build!

--

--

Brad Culberson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Data Platform Architect at Snowflake specializing in Data Applications. Designed & built many data applications in Marketing and Health & Fitness.