Deploy Snowflake code with GitHub Actions

One of the most powerful features of Snowflake is the ability to write code and functions to run alongside your data. User defined functions (UDFs) and Stored Procedures can be written in languages like Java, Scala, and Python (private preview) to do anything from transformation, validation, or something as advanced as scoring data from a machine learning model.

As I’ve been writing functions and procedures in Snowflake using code, I wanted to be able to manage and deploy my code backed by git and a CI/CD workflow. It turns out, getting something like GitHub Actions to deploy your UDFs and Stored Procedures didn’t take a ton of work! Here’s how I did it. While I used Python and GitHub Actions for this example, the same patterns could be followed for Java, Scala, JavaScript, and any CI/CD tool (with minor modifications for things like Java / Scala code compilation as part of the pipeline).

End to end workflow from code change -> commit -> deploy

Objective: I wanted to be able to write a stored procedure in Python. I don’t want to have to manually update my stored procedure through SnowSQL, but rather whenever I make a git commit, it kicks off a process to deploy and update my code. Later, I also explore adding a versioning mechanism so I can have multiple versions of the same stored procedure.

I’ve kept the stored procedure as simple as possible to just illustrated the pattern — but you could use this same pattern for something like a full-fledged Snowpark data pipeline.

  1. Check your file into a GitHub repo

I created a simple GitHub repo to host my code, committed this file — storedproc.py . Now I have version control so when I make changes to this stored proc they are tracked in git.

2. Create a GitHub Action to automatically deploy this code

GitHub Actions is a super powerful CI/CD tool you can add to any GitHub repo. You get 2,000 build minutes for free a month, which makes it an attractive choice for wiring up automation for your git repo.

I created a GitHub Action so whenever I check in a change to storedproc.py, it will securely upload the file to Snowflake and create or update my stored procedure.

The YAML for the action below may look like a lot, but it’s relatively simple. It’s just pulling in the code for my repo, downloading the SnowSQL client, uploading the file, and then running a SQL query.

I created a few GitHub secrets that stored my connection info. You can see those secrets get automatically and securely loaded into the build environment to authenticate with my snowflake account. You may notice for the file I’m staging in GitHub I create a stage called github.ref_name . That’s a special environment variable for GitHub actions that will replace based on the branch or tag kicking off the workflow.

Now whenever I make a git push back to my repo after making a code change, a GitHub action will run and deploy my code!

GitHub showing the result of one of my Action runs (#21 in this case… I’ve been making lots of updates :D)

This is useful because now as my teams may be collaborating around some of my codeful logic, we can use GitHub as the “source of truth” with automated deployments kicking off.

3. Allow multiple versions of your code

But let’s make this even better. I want to now introduce a concept of “versions” to my stored procedure. What if I want to make a change to my stored procedure, test it out in Snowflake, without updating my production version?

GitHub Actions makes this super easy! I’m just going to update my GitHub action flow and make a simple change. I’ll change the name of the stored proc based on the branch or version that kicks off the workflow.

There are a few strategies for versioning procedures. In hindsight I think having separate database.schemas may be the cleanest, so you have myproc in MY_DB_PROD.MYSCHEMA and myproc in MY_DB_DEV.MYSCHEMA. The other option is have multiple versions within a single schema. For simplicity (read: laziness) I went with single schema here, but the first method would work just as well.

If I make a change to my main branch (production in this case), it will update the stored proc in database called MYPROC_MAIN . If I want to test a change, I create a new branch with git, and push my pre-prod changes to that branch (maybe I call it ‘v2’ branch). Now when the action runs for the files in the v2 branch, it won’t touch MYPROC_MAIN , but rather create or update MYPROC_V2 ! I can now test out MYPROC_V2 without worrying about disrupting any workloads. After I validate everything is working well, I can open a pull request or merge my changes into the main branch, which will kick off another GitHub Action and only then will MYPROC_MAIN be updated.

Note that this also works for using git tags. So if I create a version of my stored procedure or UDF and add a git tag (like v1.0.0), it will create a corresponding stored procedure of MYPROC_v1_0_0 that will map to that release. You can remove this logic if you want though and just map to branches if easier.

Here’s what that slightly more advanced GitHub Action looks like (notice the SnowSQL command to create the procedure is now dynamic):

So after making a few updates, you can see my account now has my stored procs running and available to my team. As a developer, I’m able to use git as my primary interface when managing this code logic — just what I love!

-- call the latest version of code in `main` branch
CALL PYPROC_MAIN('hello');
+---------------------+
| PYPROC_MAIN |
|---------------------|
| hello world, v2.0.0 |
+---------------------+
-- call the latest version of code in `dev` branch
CALL PYPROC_DEV('hello');
+---------------------------+
| PYPROC_MAIN |
|---------------------------|
| hello world, in dev branch|
+---------------------------+
-- call the stored proc that was tagged `v1.0`
CALL PYPROC_V1_0('hello');
+---------------------------+
| PYPROC_MAIN |
|---------------------------|
| hello world, v1.0 |
+---------------------------+

Hopefully this sample is helpful and you can extend with any of your Snowflake and Snowpark UDFs, Stored Procedures, and APIs.

Questions? Suggestions? Feel free to reach out to me on Twitter @jeffhollan.

— —

Jeff Hollan is Director @ Snowflake for the Developer Platform including Snowpark, UDFs, and Drivers. I’d love to chat with you about Snowflake ❄️!

--

--

Jeff Hollan
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Director @ Snowflake. Developer platform, Snowpark, Ecosystem. I love serverless and making it easier than ever to build in the cloud. Thoughts shared are own.