From Zero to Snowpark in 5 minutes
On June 15, Snowflake made the Snowpark API available in public preview for all customers on AWS.
The Snowpark API is a paradigm shift in Data programmability for the Data Cloud; It provides language extensibility, and a DataFrame based API with a lazy execution mode to offer a more suitable developer experience allowing developers to express complex constructs without being SQL experts.
Developers will be able to build complex data pipelines which will be pushed down on the Snowflake engine for execution after conversion to SQL and will be able to leverage Snowflake cloud native elasticity and unlimited scalability.
In addition, custom logic can be written in functions developed in the language used for the Snowflake API. These functions can be converted in Snowflake User Defined Functions with a push down of the code to Snowflake, where the code can operate on the data. At the time of writing, the Snowpark API is available in the Scala language.
In order to explore the capabilities offered by the Snowpark API, I have built a Snowpark Accelerator docker image based on the snowtire_v2 project. The Snowpark Accelerator is available on a public repository in Docker hub to quickly get you up to speed in Snowpark.
The Snowpark Accelerator offers a comprehensive Data Engineering & Data Science Sandbox for Snowflake with a Jupyter Notebook Environment that you can run directly from your Mac or Windows Workstation. It offers support for Python, Spark 3.1, R and Scala and is pre-configured to connect to a Snowflake environment; It contains Snowflake drivers (ODBC, JDBC) and connectors (Python, Spark) as well as the most popular python libraries for data analysis (pandas dataframes, plotly…).
The Snowpark Accelerator is pre-configured and ready to use to explore development using the Snowpark API in Scala. It contains a quick start tutorial showing step by step how to write simple query structures (projection, filtering, joins, aggregations…) in order to familiarize with the API, as well as demonstrates the use of Scala functions which contains simple logic being pushed down as UDFs on Snowflake.
- Click on the Docker Icon on the top right hand side of your Mac Menu bar.
- Select Preferences
- Select Resources
- Set CPUs to minimum of 2.
- Set Memory to 4GB.
- Click on Apply & Restart.
Step 1 — Pull the Docker image
Open a Terminal session on your Mac and pull the image
docker pull zhoussen/snowtire-v2:snowpark-accelerator
This may take a few minutes depending on your internet connection.
Step 2 — Run the Docker image
docker run -p 8888:8888 --name snowpark-sandbox zhoussen/snowtire-v2:snowpark-accelerator
Once the image starts, you will be given the following message containing a token for authentication:
Paste the URL given above into a web browser. You should see the following home page:
Navigate to folder samples and open the Notebook Snowpark_Quick_Start.ipynb. You can now follow the step-by-step tutorial and explore Snowpark capabilities:
If you are not familiar with Docker, you can find additional tips on how to run, stop and start the image on the snowtire_v2 github project.