Unit testing with Snowpark Python in Azure DevOps

In a previous blog post called “Snowpark meets Azure DevOps to enable automation of building data pipelines”, I went over the steps to create a simple CI/CD pipeline in Azure DevOps using Snowpark Scala for developing data pipelines. Now that Snowpark Python is available to selected Snowflake accounts (in Public Preview as of June 2022), let’s take a look at another simple example where we can create Snowpark Python code (similar to the one in Scala in the previous post), add unit tests using the popular unit testing framework, pytest, and run these unit tests as part of the Azure Pipelines build.

(If you’d like to know more about Snowpark Python, please make sure to check out a recent webinar here.)

Photo by Aaron Burden on Unsplash

First, I created a Git repository called demo_snowpark_python in Azure Repos with the following files including a functions.py:

functions.py

The get_session () method returns a Snowpark session connecting to a Snowflake account with the connection parameters (including the right size virtual warehouse) specified. get_unit_test_session() uses the connection parameters for unit testing including a one node XS virtual warehouse. I am storing the password in Azure Key Vault so we can programmatically access it using the Azure Key Vault Client Library for Python. It is important to write modular code to create unit tests for a good test coverage so I also created different functions for some of the transformations as well as loading the results to a target table.

Next, let’s create a pipeline.py and query a CUSTOMERS table in our Snowflake account into a Snowpark Data Frame, apply some transformations (renaming columns, aggregation etc.) and create a new table in Snowflake with the aggregated data using the functions we defined in functions.py file.

pipeline.py

After we run pipeline.py, we can query our target table, NUMBER_OF_CUSTOMERS_BY_MKT_SEGMENT, that contains the aggregated results (customer count by market segment) as below:

NUMBER_OF_CUSTOMERS_BY_MKT_SEGMENT table

Now we can create a new file called test_pipeline.py in a /test directory, import pytest Python package and create some unit tests to test our functions that are defined in the functions.py file. We are initializing the Snowpark session as a pytest fixture using the @pytest.fixture decorator.

test_pipeline.py

Finally, we can create a task to run pytest command as one of the steps in the Azure Pipeline to execute the unit tests we defined as part of the CI/CD pipeline. If the unit tests fail, the CI/CD pipeline will also fail and the rest of the steps will not get executed. Below is an example of azure-pipelines.yml with the task to run pytest:

azure-pipelines.yml

When we run the CI/CD pipeline, we can see that all our unit tests passed in this run:

As subsequent steps, we are also publishing the pytest results so that they can be visualized in Azure DevOps Test Plans > Runs as below:

After publishing the pytest unit test results, the subsequent CI pipeline steps are configured in the azure-pipelines.yml file to run the Snowpark data pipeline (pipeline.py) to transform customer data and build our target table, and finally store versioned artifacts in a zip file and publish them to an artifact repository.

I hope this simple example of building CI/CD pipelines with Snowpark Python using pytest for unit testing is helpful to get you started. Snowpark truly enables Analytics teams to add software engineering practices to their development and release processes in the Snowflake Data Cloud.

Thank you for taking the time to read this blog post.

--

--

Eda Johnson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

NVIDIA | AWS Machine Learning Specialty | Azure | Databricks | GCP | Snowflake Advanced Architect | Terraform certified Principal Product Architect