Snowpark Python Advantages

Snowpark Python was announced during Snowday November last year. There was a great demo presentation that I encourage you to watch. In this past blog entry, we reviewed the great capabilities of Snowpark and how it helps to streamline data pipelines and feature engineering.

With Snowpark Python now in Private Preview I would like to review some of the key capabilities and advantages that this will bring to customers in addition to all what was previously discussed.. This is my personal opinion and the result of my conversations with Global System Integrators.

Python Familiar Syntax

We are not reinventing the wheel. Dataframes are used by data engineering and datas science teams.

Secure Access to Anaconda Open-Source Libraries

Snowflake announced the partnership with Anaconda. This provides easy secure access to Python ecosystem with automated dependency management. This is also provided in a highly secure sandbox environment, providing consistent governance and security policies.

User Defined Functions

Python code can be 100% pushed down to run as an UDF in a highly secure sandbox within Snowflake virtual warehouses. These functions are also available as SQL functions, so that they can be run in any data pipeline with tasks and streams and provide real-time predictions in an incoming stream of data.

Performance and Scalability

All transformations are pushed down to Snowflake virtual warehouses. These can be enabled, suspended and resized on demand in seconds, providing the performance needed and only paying by the second in a real utility model.

Use Rich Machine Learning Tools Ecosystem

ML is continuously evolving and you do not want to lock yourself which will prevent innovation. Integrate with tools like AWS Sagemaker, Azure ML, Dataiku, DataRobot, H2O.ai, Jupyter, etc.

Enrich your Data with Third-party Datasets

Use Snowflake Data Marketplace to immediately access data that will enrich your features. All this without having to copy or move any data, which is live, ready to be queried. The demo at Snowday shows how to enrich your data with IPinfo dataset.

Data as one Hyper-parameter

We trace hyper-parameters used to train our ML models. But one key aspect is to know the dataset you used to train your model. Snowflake Zero-Copy Clone feature enables the capability to make an immediate copy of the data that was used. This is a very powerful tool for Data Scientists as they can always trace back the data that was used for training.

Let’s Snow!

Carlos-

--

--