Create and Test Snowpark DataFrames Locally (PuPr)

Snowpark provides a set of libraries and runtimes that developers can use to securely deploy and run Python code in Snowflake. On the client side, the Snowpark DataFrame API enables developers to write queries and data transformations using familiar DataFrame semantics while the computation is pushed down and scaled out in Snowflake.

Conceptual diagram of the Snowpark API and runtime

As our users have adopted Snowpark for their business-critical data workloads, the desire for rigorous testing of Snowpark DataFrames has become even more important. That’s why today we are happy to announce that you can now create and test Snowpark DataFrames locally–no connection to a Snowflake account necessary! The Snowpark DataFrames are created and transformed on your local compute, allowing you to accelerate your test suites and DevOps pipelines.

from snowflake.snowpark.session import Session

session = Session.builder.config('local_testing', True).create()
df = session.create_dataframe([[1,2],[3,4]], ['a', 'b'])
df.with_column('c', df['a']+df['b']).show()

This functionality is now in public preview and entirely compatible with the Snowpark Python API you’re already familiar with. When creating the Snowpark Session, simply set the local_testing configuration to True, then all DataFrames from that Session will be created on your local compute, such as your dev machine or your CI/CD pipeline!

Demo of the local testing framework with PyTest

This new functionality really shines when it’s used with PyTest and other testing frameworks. The video above shows an example Snowpark Python project which has a set of DataFrame transformations used within a stored procedure. After adding just three lines of code to the Session fixture, we can easily switch between running the tests locally or against a Snowflake account.

Ready to try it out? Here are some resources to help you get started:

Turbocharge your Snowpark Tests

Since the local testing functionality uses the same API methods as the rest of Snowpark, you can easily switch between “local” and “live” Sessions and run your test suites in either mode. This means you can use a local Session for quick validation before submitting a pull request, and then run the tests against a Snowflake account in an automated CI/CD pipeline. Switching between the two modes is especially easy with PyTest fixtures, and an example is provided in the documentation.

# test/conftest.py
import pytest
from snowflake.snowpark.session import Session

@pytest.fixture(scope='module')
def session(request) -> Session:
if request.config.getoption('--snowflake-session') == 'local':
return Session.builder.config('local_testing', True).create()
else:
return Session.builder.configs({...}).create()

Once you have the fixture set up, you can switch between local and live modes by adding the option, --snowflake-session local, to the pytest command. This is just one example of how you can leverage the local testing framework in a DevOps process.

If a test case happens to use an API which is not supported by the local testing framework, then you can disable that singular test whenever a local session is used with a PyTest mark. You can find an example for this, and many other testing scenarios, in the documentation.

More Information

  • If you run into a problem or have a question, feel free to open an issue on the GitHub repo for Snowpark Python.
  • The limitations of the local testing framework are outlined in the documentation.

--

--