Deploying Custom Python Packages from GitHub to Snowflake
UPDATE: As of Nov 7, 2022. Snowpark for Python is GA.
Overview
Snowpark provides a way for developers to extend Snowflake platform’s functionality by way of writing code in languages like Python, Java, Scala, and JavaScript. In this blog, we will review a way to deploy custom pure-Python packages from a GitHub repo using GitHub Actions onto Snowflake.
More importantly, when these packages are used to create User-Defined Functions (UDFs) in Snowpark Python, this approach makes it easy to maintain and deploy your custom code across several UDFs at once and automagically without manual intervention. Another benefit this setup highlights is the decoupling of the custom package(s) from the script that uploads the package to Snowflake and displays names of UDFs that use the updated custom package.
User-Defined Function
To demonstrate how the entire flow comes together, let’s consider this simple UDF in Snowpark Python. (This script is run in Snowsight.)
Script Notes 👇
- Python package do_something_cool.py is added as a dependency — line 6. Note that the imports statement includes the entire path; name of the stage and the filename
- That package is then imported (line 9) and the method greetings() is called (line 11) passing in the parameter passed into this UDF
And this is how you can call this UDF in SQL:
SELECT greet_me('Dash');
… which will output
Hello, Dash! How are you doing today? It's great to see you today!
Now let’s see how we can automate the flow of reflecting the updates to the Python package do_something_cool.py added as a dependency (line 6) in this UDF using GitHub Actions.
Setup: GitHub Repository 1
Let’s say you have a GitHub repository where you maintain this Python package that you and your team members contribute and collaborate.
Here’s “do something cool” 😜
This repo is also where you’d setup GitHub Actions.
Here’s what the GitHub Actions workflow file (workflow details below) would look like:
Workflow details 👇
- Triggering Event: On push to main branch when the code is updated
- Workflow Steps:
—1. Setup Python 3.8
— 2. Install dependencies requests and snowflake-connector-python required to deploy the updated package code
— 3. Download script (from Repo 2 — see below) that will upload the updated custom code to Snowflake
— 4. Setup environment variables (database, warehouse, schema, user, password, and role) stored in GitHub Secrets. These variables will be used by Snowflake Python Connector in the script (downloaded in step 3) to connect to Snowflake
— 5. Run script (downloaded in step 3) to deploy the updated code
[Code on GitHub: Sample custom package | GitHub Actions workflow file]
Setup: GitHub Repository 2
This repo contains Python script (see details below) that is downloaded and run from GitHub Actions workflow setup in Repo 1 when updates are made to the package in the main branch in Repo 1.
Here’s what the script would look like:
Script steps 👇
— 1. Connect to Snowflake using Snowflake Python Connector and the environment variables set using GitHub Secrets
— 2. Download the list of packages (see below JSON format) to be updated
— 3. For each package, download the latest version off main branch from Repo 1 and upload it to the Snowflake stage
— 4. Loop through all the UDFs, examine their imports, and print out names of UDFs that use the updated package
List of Packages
This repo also contains a JSON file that has a list of packages (referenced in step 2 above) to be updated when the update_packages.py script is run.
Here’s the file format:
Note: The stage in the JSON file refers to a stage in Snowflake where the file is uploaded.
Testing
To test, update do_something_cool.py in Repo 1 (for example, change the greeting message to “Welcome, how is your day going?”) and push the changes to the main branch. At that point you should see a build kickoff and if it succeeds (i.e. code/syntax is correct, all of the referenced files and paths exists, the connection params to connect to Snowflake are valid, etc.), then you should see build output similar to this:
Then, running this SQL again…
SELECT greet_me('DASH');
… will output
Hello, Dash! Welcome, how is your day going?
Note: This setup may not work for all scenarios and does not include automated tests that are run before the code is/can be merged in the main branch. (Definitely something to consider in production environments.) Having said that, a similar setup can also provide a good foundation for deploying ML models used across UDFs.
That’s it!
Thanks for your time, and feel free to follow me on Twitter and LinkedIn where I share demo videos, code snippets, and other interesting artifacts specifically around Snowpark.