Python data ingestion with Snowpark, in 5 steps

Can we skip using ETL / Data Integration tools?

Sometimes. Maybe. At what cost?

It is holiday season, you’ve tested the API you need, it isn’t so easy, you’ve got complex requirements. Your DI tool has a limited rest connector and there’s no native connector yet.

You need to get the data in to Snowflake — and you need to do it quickly. Can your ETL/DI tool handle the job well enough? Is it better than your own short code? Is it going to cost a lot of DI credits (many companies pay per: x million rows / x GB) to process that data for ingestion?

💡 Try loading it through Snowpark, where you have control over what is loaded and how.

What is Snowpark?

Snowpark is a set of capabilities — you do not need to “know Snowpark”.

Snowpark offers the opportunity to write code in familiar languages like Python, Java, or Scala, directly within Snowflake. With Snowpark being a core Snowflake capability, it coincides with Snowlake’s enhanced security, reduced complexity, and increased efficiency. Whether you’re performing data engineering, data science, or machine learning tasks, Snowpark streamlines these processes, making Snowflake not just a data warehouse, but a comprehensive data processing powerhouse. Snowpark represents a significant step towards more flexible, efficient, and integrated data operations.

Data App: User-driven data ingest

1. Sourcing test data

I’m a property nerd at times, so Zoopla (🏡British Zillow, unofficially) tends to be my go-to API, since it updates frequently… I’m also tired by the ever-unchanging Pokémon and Star Wars APIs.

Find more APIs here: https://github.com/public-apis/public-apis

2. Connecting to an API in Python

Lets keep it super simple, we’re using the requests library in Python to connect to the Zoopla API (you do need to register for an API key).

Stuck with Python? Consider using ChatGPT or Bard (links have sample output of connecting to the Joke API).

See below for the python used. For my use case, I have setup the python function to support search by location.

import requests
import json
def fetch_data(areainput):
api_key
endpoint = 'api/v1/property_listings.js' # Updated to .js for JSON endpoint
areainput = areainput if areainput else "London" # Default to 'London' if areainput is None or empty
params = {'area': areainput, 'listing_status': 'sale'}
url = f'https://api.zoopla.co.uk/{endpoint}'
params['api_key'] = api_key # Append your API key to the parameters

try:
response = requests.get(url, params=params)
response.raise_for_status() # This will raise an HTTPError if the request fails

if response.status_code == 200:
# The response is already in JSON, so no need to convert
json_data = response.json()
return json.dumps(json_data) # Convert Python dict back to JSON string for consistency
except requests.HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f"An error occurred: {err}")

3. Preparing to connect Snowflake to external addresses

This is quite simple and generally done in 2 steps when working with public apis. We’ll be creating a Network Rule, which as of 16th January 2024, is in public preview.

IP/DNS address should be specified (DNS is best practice).

-- 1a. Setup network rule to access external source
CREATE OR REPLACE NETWORK RULE ZOOPLA_RULE
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('api.zoopla.co.uk','52.48.177.41','52.49.107.252','54.195.120.119');

-- 1b. Setup access intregration, using the rules set
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION ZOOPLA_ExternalAccess
ALLOWED_NETWORK_RULES = (ZOOPLA_RULE)
ENABLED=TRUE;

4. Creating the Snowpark UDF (User Defined Function)

This step is the critical “I’m using Snowpark” step. Look at the code at step 2, then look back here. You can see we have only added 8 lines of boiler-plate code — now you know Snowpark. 🥳

Always:

  • Specify libraries
  • Import libraries (sometimes required even if native)
  • Use secrets to protect your API key
CREATE OR REPLACE FUNCTION Zoopla_API(areainput string)
returns string
language python
runtime_version=3.11
handler = 'fetch_data'
external_access_integrations=(ZOOPLA_ExternalAccess)
packages = ('requests', 'snowflake-snowpark-python' )
as
$$
session = requests.Session() #ensures that TCP connections are reused, else user may exhaust it
import requests
import json

def fetch_data(areainput):
api_key
endpoint = 'api/v1/property_listings.js' # Updated to .js for JSON endpoint
areainput = areainput if areainput else "London" # Default to 'London' if areainput is None or empty
params = {'area': areainput, 'listing_status': 'sale'}
url = f'https://api.zoopla.co.uk/{endpoint}'
params['api_key'] = api_key # Append your API key to the parameters

try:
response = requests.get(url, params=params)
response.raise_for_status() # This will raise an HTTPError if the request fails

if response.status_code == 200:
# The response is already in JSON, so no need to convert
json_data = response.json()
return json.dumps(json_data) # Convert Python dict back to JSON string for consistency
except requests.HTTPError as http_err:
print(f"HTTP error occurred: {http_err}")
except Exception as err:
print(f"An error occurred: {err}")

# Call the function to get JSON data
#json_data = fetch_data(areainput)
$$;

Check your function(s).

⚠ The function AND parameters make it unique. If you change your mind about how many parameters you’ll need, you’ll end up with duplicate functions i.e. Zoopla_API(), Zoopla_API(X,Y), Zoopla_API(X,Y,Z).

Use the code below to check your functions

SHOW FUNCTIONS LIKE 'Zoopla_API';

5. Running the function & storing data

Creating a table to add data to:

create or replace TABLE DEMO_DATASET.UK_PROPERTY.ZOOPLA_OUTPUT_RAW (
OUTPUT VARCHAR(16777216),
LOAD_TIME TIMESTAMP_LTZ(9)
);

Creating a stored procedure to load the data to the new, empty table.

CREATE OR REPLACE PROCEDURE insert_zoopla_data(area_input VARCHAR DEFAULT  'London' )
RETURNS STRING
LANGUAGE SQL
AS
$$
BEGIN
INSERT INTO zoopla_output_raw
(
SELECT Zoopla_API( :area_input ) AS output, CURRENT_TIMESTAMP() AS LOAD_TIME
);

RETURN 'Data inserted successfully';
END;
$$;

Now for the live demo

You’re going to do the demo 😎

Open the embed link below, name a place in the UK, and see a new point added to the map! If the map looks too busy, use the list tab to see new individual listings.

--

--