Writing Robust Procedures with Snowpark

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readApr 25, 2024

Stored procedures are vital components for logic encapsulation, serving as fundamental building blocks for any Snowflake project. Whether for performing database operations or supporting Snowflake native applications, stored procedures are crucial.

Crafting robust stored procedures is imperative for efficiency, ease of understanding, troubleshooting, modifications, and scalability. While this article provides some guidance and key considerations to elevate your development in Snowflake with a focus on stored procedures and Snowpark for Python, most of the points described here also apply to Snowflake UDFs or any other Snowpark initiative.

Exception handling

Take control of the execution by implementing error handling.

It is recommended to use Try-Except-Finally blocks to catch specific Exceptions and clean up your process with a Finally Block when needed. This practice allows you to handle various errors and resolutions more efficiently instead of having only one generic Except block.

Snowpark provides various Exception classes for tailored handling. Here is an example:

#The 'try' block runs a block of code
#The 'except' block handles any error from the 'try' block.
#The 'else' block executes code when there is no error.
#The 'finally' block executes code, regardless of the result of the previous blocks.

(...)
from snowflake.snowpark.exceptions import SnowparkSessionException,SnowparkSQLException, SnowparkClientException
(...)
    try:
        #<do something>
    except SnowparkSessionException as e: # exception handling example
        print('EXCEPTION | Error on session connection : {} '.format(e.message))
    except SnowparkSQLException as e: # exception handling example
        print('EXCEPTION | Error on SQL : {} '.format(e.message)) 
        print('Query_id : {} '.format(e.sfqid)) 
    except SnowparkClientException as e: # exception handling example
        print('EXCEPTION | Something didnt work on snowpark: {} '.format(e.message)) 
    except Exception as e: #cathing all errors, it can be anything else
        output_message = 'EXCEPTION-PYTHON | Something didnt work on Stored Procedure:'
    else :
        print('Nothing when wrong!') 
    finally: 
        #session.close() 
        #pass
(...)

Additional examples here

Logging

Implement logging to capture relevant information for debugging and monitoring.

With logging we can record events that happen during the code execution, this a native Python example:

# importing module
import logging

# Creating the logger object
logger = logging.getLogger("mylog")

# Setting the threshold of logger to DEBUG
logger.setLevel(logging.DEBUG)

# logs message at debug level
logger.debug("This is the first debug message")
logger.debug("This is the second debug message")

The Snowflake Event tables feature is designed to help you with this task by capturing the log messages and saving them in a table (event table), allowing you to query it for future needs (eg: troubleshooting, performance insights, etc).

It is important to be careful when writing logging messages to avoid revealing sensitive information. If you need to share an event with colleagues, make sure that you have set up Role-Based Access Control (RBAC) properly. To protect personal information, Create a view of the event table and apply masking policies to it. This will ensure that personal identifiable information (PII) is either masked or removed.

This is how it will look like within a store procedure:

CREATE OR REPLACE PROCEDURE do_something()
RETURNS VARCHAR
LANGUAGE PYTHON
PACKAGES=('snowflake-snowpark-python')
RUNTIME_VERSION=3.8
HANDLER='run'
AS $$
import logging

logger = logging.getLogger("mylog")
logger.info("Logging from Python")

def run(session):
  logger.info("function start")
  try:
    #<do_something>
    x = my_private_data
  except Exception:
    logger.error("Logging an error from Python")
    logger.error("my_private_data is : " + my_private_data) # THIS IS A VERY BAD EXAMPLE
    return "ERROR"
  return "SUCCESS"

$$;

You will find the logs message in the event table , this table has a predefined set of columns. Please refer to this documentation for more details about the configuration and examples.

Tracing

While with Logging, you can manually capture point-in-time messages, with tracing, you can automatically capture the flow of code execution (span).

Event tracing can be valuable and contribute to the maintainability and resilience of production Stored Procedures, but extensive use of tracing might introduce overhead and produce a performance degradation.

You can find various examples for tracing here by using Event tables.

Parameterization

Ensure Flexibility by designing procedures that can easily adapt to new parameters with minimal disruption.

A good option is to leverage Snowflake’s semi-structured data types and Semi-structured and Structured Data Functions.

As an example, the following stored procedure receives the input parameters in JSON format (Snowflake Variant data type) while parameter order is not enforced:

CREATE OR REPLACE PROCEDURE MY_DB.MY_SCHEMA.MY_STORED_PROCEDURE(input_configuration_options variant)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
EXECUTE AS CALLER 
AS
$$

def run(session, input_configuration_options):
       -- You can access the input variables as follows :
       return input_configuration_options['MY_INPUT1'] 
       --Or also : input_configuration_options.get('MY_INPUT1')
       (...)
$$;

One of many ways to call the previous procedure is :

CALL MY_DB.MY_SCHEMA.MY_STORED_PROCEDURE(PARSE_JSON(
  $$
    {
                'MY_INPUT1'     : "STRING_VALUE1",
                'MY_INPUT2'     : "STRING_VALUE1" ,
                'MY_INPUT3'     : True
    }
  $$
));

Input Validation

Validate input parameters against expected criteria to reduce the risk of errors or unexpected results.

To ensure consistency between different languages, review the Snowflake data type mapping. The following example demonstrates how to validate data types and mandatory parameters.

CREATE OR REPLACE PROCEDURE  MY_DB.MY_SCHEMA.MY_STORED_PROCEDURE(input_configuration_options variant)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'run'
EXECUTE AS CALLER 
AS
$$

def run(session, input_configuration_options):

    expected_configuration_options = {
        'MY_INPUT1'     : {"data_type" : str  ,  "is_mandatory" : True  },
        'MY_INPUT2'     : {"data_type" : str  ,  "is_mandatory" : True  }, 
        'MY_INPUT3'     : {"data_type" : bool , "is_mandatory"  : False, "default" : False }
    }

    -- Parameter validation
    -- NOTE :  Parameters are case-sensitive in this example
    for option, option_info in expected_configuration_options.items():
        if option in input_configuration_options.keys():
        -- Checking data type
            if  not isinstance(input_configuration_options.get(option),option_info['data_type']):
                output_message = "Error: Wrong data type. {} shoulb be {}".format(option,option_info['data_type'])
                return output_message   
        -- Checking mandatory parameters
        elif option_info['is_mandatory']:
            output_message = "Error: Madatory input parameter {} missing".format(option)
            return output_message
        -- Set default value, not mandatory and not in input 
        else :
            input_configuration_options[option] = option_info['default']

    (...)

$$;

Documentation

In addition to the comments you can add to the code, supplement it with object comments, and provide a high-level description of the stored procedure.

In the context of Stored procedures, Snowflake provides PROCEDURES View, among many Others. Reading the comment field (From PROCEDURES View) can be particularly helpful for anyone who may be reviewing or governing company procedures.

Testing

Include comprehensive testing. Refer to PyTest documentation for writing tests. Some useful documentation and examples here.

In addition, consider performing performance and concurrence testing.

Performance and optimization

This is a very wide topic. Choosing the most appropriate implementation approach can be challenging. On the one hand, factors such as time and space complexity are part of every programming language. On the other hand, we have Snowpark methods along with the rest of Snowflake features that have to be used attentively. Here is some advice to help with this task:

Verify the query profile of the queries generated from your stored procedure.
Avoid redundant or expensive computations in your algorithm (e.g., Row by row processing loop, exploding joins, Unnecessary and complex Regex, etc).
Implement lazy evaluation. Push down processing to Snowflake Virtual Warehouse and Delay the use of Snowpark Dataframe actions that cause data to be computed and materialized on the client side until needed. (eg: Collect(), to_pandas())
Consider a batch processing approach.

Some extra applicable documentation:

Optimizing performance in Snowflake | Snowflake Documentation

The following topics help guide efforts to improve the performance of Snowflake. Exploring execution times Gain…

docs.snowflake.com

Vectorized Python UDFs | Snowflake Documentation

This topic introduces vectorized Python UDFs. Vectorized Python UDFs let you define Python functions that receive…

docs.snowflake.com

Snowpark Python: Top Three Tips for Optimal Performance

This Quickstart guide contains key tips for optimal performance when using Snowpark Python. The guide is broken up into…

quickstarts.snowflake.com

Concurrency

If applicable, implement mechanisms to handle scenarios where multiple instances of the same procedure run simultaneously.

Concurrent Stored Procedures can lead to unwanted references to objects or artifacts created by the other concurrent Execution, producing unexpected results, like data duplication, object or content removal or alteration, etc.

Summary

This small guide covers various aspects that can contribute to the efficiency, effectiveness, and maintainability of Snowflake stored procedures or UDFs. While some points can be easily implemented and standardized (e.g., Documentation, input validation), others require more attention and expertise (e.g., Performance optimization, testing).

Consider including the following checklist in your development process:

[ ] Exception handling : Snowpark Exceptions classes
[ ] Logging : Snowflake Event tables
[ ] Tracing : Snowflake Event tables
[ ] Parameterization : Snowflake Data types and Semi-structured and Structured Data Functions
[ ] Input Validation : Snowflake Data types and Data type mapping
[ ] Documentation : Snowflake Object Comment and Procedures View
[ ] Testing : Tests for Snowpark Python and any other python library like PyTest
[ ] Performance and optimization : Snowflake Query Profile. Snowpark Lazy Dataframes support.
[ ] Concurrency : Snowpark Exceptions classes and native python control flows.

Additionally, you can use the Snowflake Alerts and Notification feature to send emails or trigger automatic incidents (e.g., Trigger alerts based on event tables mentioned above, Based on Procedures View, or any other Snowflake metadata, data quality checks, etc.).

Finally, in the era of AI, keep an eye open for the latest Snowflake innovations that can help you in your day-to-day (They are in preview at the moment of writing this article): Copilot, Arctic (use this app!)