Introducing: Simple workflow for using Python Packages in Snowpark

Photo by Earl Wilcox on Unsplash

Are you struggling with using Python packages that are not available on Snowflake’s Anaconda channel? All pure Python packages are now seamlessly available for use by your stored procedures and User Defined Functions (UDFs)!

You can add your custom packages in a requirements file, a Conda environment file, or as arguments during UDF/stored procedure registration. The packages you want will be automatically resolved, installed, and uploaded as part of your Snowpark code.

How do I use this feature?

There are several ways you can use an unavailable Python package. All of them require using the session.custom_package_usage_config parameter and snowpark python client version to be 1.6.1 or above.

The code examples below demonstrate four ways to import the Python library scikit-fuzzy, which is unavailable in Snowflake’s Anaconda channel as of July 2023.

from typing import List
from snowflake.snowpark.session import Session
import snowflake.snowpark.functions as F


connection_parameters = {
"account": SF_ACCOUNT,
"user": USERNAME,
"password": PASSWORD,
"role": SF_ROLE,
"warehouse": SF_WH,
"database": SF_DB,
"schema": SF_SCHEMA
}
session = Session.builder.configs(connection_parameters).create()


# Required
session.custom_package_usage_config = {
"enabled": True
}


@F.udf(name="my_udf", packages=["scikit-fuzzy==0.4.2"], replace=True)
def my_udf_function() -> list:
import skfuzzy as fuzz
return [fuzz.__name__ + '/' + str(fuzz.__version__)]
session.sql(f"select {my_udf_function.name}()").to_df("col1").show()




@F.sproc(name="my_sproc", packages=["scikit-fuzzy==0.4.2", "snowflake-snowpark-python"], replace=True)
def my_stored_procedure(session: Session) -> List[str]:
import skfuzzy as fuzz
return [fuzz.__name__ + '/' + str(fuzz.__version__)]
print(session.call("my_sproc"))
  • Add your package to a requirements.txt file (Docs).
# my_requirements_file.txt
scikit-fuzzy==0.4.2
# add your other requirements here.
import snowflake.snowpark as snowpark
session = snowpark.Session.builder.configs(CONNECTION_PARAMETERS).create()
session.custom_package_usage_config = {"enabled": True}
session.add_requirements("./my_requirements_file.txt")

# Any UDF/stored procedure created using session object can now use the scikit-fuzzy module, as long as the `packages` and `imports` arguments are not mentioned while registering your UDFs/stored procedures.
# my_environment_file.yml
name: my_environment # Name of the environment

dependencies: # List of packages and versions to include in the environment
- python=3.9.1 # Python version
- pip=23.1.2
- pip:
- scikit-fuzzy==0.4.2
import snowflake.snowpark as snowpark
session = snowpark.Session.builder.configs(CONNECTION_PARAMETERS).create()
session.custom_package_usage_config = {"enabled": True}
session.add_requirements("./my_environment_file.yml")

# Any UDF/stored procedure created using session object can now use the scikit-fuzzy module, as long as the `packages` and `imports` arguments are not mentioned while registering your UDFs/stored procedures.
  • Add your packages viaSession.add_packages (Docs).
import snowflake.snowpark as snowpark
session = snowpark.Session.builder.configs(CONNECTION_PARAMETERS).create()
session.custom_package_usage_config = {"enabled": True}
session.add_packages(["scikit-fuzzy==0.4.2"])

# Any UDF/stored procedure created using session object can now use the scikit-fuzzy module, as long as the `packages` and `imports` arguments are not mentioned while registering your UDFs/stored procedures.

Note - The best practice for using custom Python packages is to specify a stage folder as persist_path argument, so that your custom packages can be loaded faster (See the ‘Environment Persistence’ section for details).

How does it work?

Snowpark will pip install Python packages that you require (and are not available on Anaconda) locally in a temporary directory, then zip and upload to a stage for use. Dependency packages that are present in Anaconda are pruned before zipping. Any package that contain OS native code, i.e. code typically written in C/C++ that is compiled into a binary file, MUST originate from Anaconda. Therefore, a best-effort attempt will be made to switch to versions of the package present in Anaconda.

Limitations

  1. Python packages that you require (which are not present on Snowflake’s Anaconda channel) will be pip installed locally and imported for use via a temporary remote stage directory. To allow this, pip needs to be present in your environment (See the ‘Environment Persistence’ section for a workaround).
  2. A local temporary directory will be used to store packages before they are uploaded. Permission to write to a temporary directory is required (See the ‘Environment Persistence’ section for a workaround).
  3. If you need packages that rely on OS native code, the packages MUST originate from the Anaconda Snowflake channel. If your package relies on dependencies that use native code, Snowpark will make a best-effort attempt to switch to versions present in Anaconda. This might result in versioning incompatibility issues.
  4. If the package you want relies on OS native code and is not present on the Anaconda channel, it is highly likely that your stored procedure/UDF will not work. If you wish to proceed with using the package regardless, you can force push the package by switching on the force_push flag in the session.custom_package_usage_config parameter. (Psst, it would be really helpful if you could vote for the package you want on our incoming requests board, see documentation on how to do this).

Note — This package auto-upload feature is currently marked as experimental and works well on UNIX systems only, please do not use it in production!

Environment Persistence

If you wish to frequently use a set of unavailable Python packages, you can cache them by specifying a remote stage directory path as the cache_path argument.

By specifying a stage path, your packages will be imported faster and you will not require pip or file-write permissions in your Snowpark environment.

import snowflake.snowpark as snowpark
session = snowpark.Session.builder.configs(CONNECTION_PARAMETERS).create()

session.custom_package_usage_config = {
"enabled": True,
"cache_path": "@my_permanent_stage/my_directory"
}

session.add_requirements("./my_environment_file.yml")
# remote folder 'mydirectory' will now contain your zipped packages as well as a metadata file to organise your environments.

Note — You do need to run your code at least once in an environment which contains pip and allows writing files to a temporary folder. This will create a zip file corresponding to your environment (the zipped file can then be used in other environments).

Try Snowpark today

If you’re interested in trying out Snowpark Python, be sure to check out some of our great quick starts at quickstarts.snowflake.com.

--

--