Running pip packages in Snowflake

UPDATE: Since this article was written, there are far more streamlined ways to achieve this:

https://medium.com/snowflake/introducing-simple-workflow-for-using-python-packages-in-snowpark-928f667ff1aa

Definitely check out snowcli, it will automatically detect which of the python packages aren’t on Anaconda and will download bundle them in a zip artifact for you.

Recently, Snowflake announced support for Python in public preview, and I wrote a post showcasing how easy it is to use. There’s also a great overview here on some use cases, and how the feature fits in the ecosystem.

Snowflake partnered with Anaconda to simplify package management. They provide a comprehensive set of curated third party packages that you can import by simply listing them in the function definition. Dependencies will automatically be resolved, and you don’t need to worry about loading package files into your Snowflake account.

You can extend this further via the use of Snowflake Stages. The UDF documentation shows some examples of importing your own python scripts, a text file, and even a zipped NLP model.

In this post, I’ll cover one extra scenario which is importing a “wheel” file which has been downloaded from PyPi. This should work as long as the package is platform-independant and doesn’t require OS-native libraries or specific CPU architectures.

I should also add that if there’s a useful PyPi package missing from the Snowflake Anaconda channel, it’s worth asking Snowflake to add it — they are eager to make it convenient to use popular, relevant Python packages.

To demonstrate this, I’m going to expand on my previous Flaker 2.0 post, linked above. Faker has various community providers that have pip packages, but are not part of Snowflake’s Anaconda channel.

To use them, we’re going to download the package from PyPi, upload it to a Snowflake stage, then extract and import it from within our Snowflake function.

Step 1: Download the wheel file

One you’ve found your project on PyPi, go to the “Download files” menu and download the Built Distribution (a .whl file).

In my example, I’ll be loading both faker-biology and faker-music.

Step 2. Download the wrapper script

Save the following code snippet locally, with the file name “wheel_loader.py”.

The script uses the approach described in Snowflake’s documentation, to extract the contents of a staged zip file. Then it adds the extracted package contents to the system path so that the package can be imported.

Step 3: Upload files to a Snowflake stage

First, create a Snowflake stage:

CREATE STAGE OTHER_PYTHON_PACKAGES

Then, upload the files using Snowsql:

snowsql -q "PUT file://wheel_loader.py @OTHER_PYTHON_PACKAGES AUTO_COMPRESS=FALSE OVERWRITE=TRUE"
snowsql -q "PUT file://faker_music-0.4-py2.py3-none-any.whl @OTHER_PYTHON_PACKAGES AUTO_COMPRESS=FALSE OVERWRITE=TRUE"
snowsql -q "PUT file://faker_biology-0.6.0-py3-none-any.whl @OTHER_PYTHON_PACKAGES AUTO_COMPRESS=FALSE OVERWRITE=TRUE"

Step 4: Create the Snowflake function

In the example below, I include the staged files in the function call (lines 6–10). I import the wheel_loader script (line 17), and use it to extract the wheel files and add them to the system path (lines 20–21). Then I can import the libraries (lines 23–24).

Step 5: Call the function

select FAKE_WITH_EXTENSIONS('en_US','celltype',null)::varchar as FAKE_CELLTYPE
from table(generator(rowcount => 50));

The results below show cell types from the third party faker-biology provider:

Conclusion

I’ve demonstrated it’s possible to run pip packages inside Snowflake, and it’s a useful fallback if a Conda package isn’t readily available.

--

--