Getting Started with Data Engineering and ML using Snowpark for Python

BONUS: It has code for building Streamlit application on top of your ML model.

Reach out to me of you’d like the Snowflake Bear with Python sticker :)

There is no shortage of great content here on Medium and QuickStart Guides which cover various aspects of working with Snowpark for Python contributed by Snowflakes as well as (Snowflake) Data Superheroes. (Learn how you can become a Data Superhero!)

In this new technical guide I published recently is all about getting your hands dirty as quickly as possible to learn what and how you can accomplish certain data engineering and ML tasks using Snowpark for Python.

Here’s what my awesome colleague Julian had to say, and I couldn’t have said it any better…

#Snowpark is just translating #Python into SQL. This is the most popular misconception about Snowpark that I see over and over in forums.

While in fact DataFrame operations do get translated to SQL in order to efficiently run queries inside Snowflake, there is a whole other side to Snowpark when using User Defined Functions (UDFs) and Stored Procedures that allow you to both execute and orchestrate custom python code.

Dash Desai has built a great getting started guide with Snowpark that will help you grasp Snowpark concepts with hands on experience through a step by step guide that shows you how to use it in both Data Engineering and Data Science scenarios.

To summarize, here’s an overview of what’s covered in the step-by-step guide linked below:

  • Setup Environment: Use stages and tables to ingest and organize raw data from S3 into Snowflake
  • Data Engineering: Leverage Snowpark for Python DataFrames to load data, perform data transformations such as group by, aggregate, pivot, and join to prep the data for downstream applications. Once done, leverage Snowflake Tasks to optionally turn your code into operational pipelines with integrated monitoring.
  • Machine Learning: Prepare data and run ML Training in Snowflake using Snowpark ML and deploy the model as a Snowpark User-Defined-Function (UDF)
  • Streamlit Application: Build an interactive web application using Python to help visualize the ROI of different advertising spend budgets including making predictions on new data points using the deployed mode and UDF.

Prerequisites

Ok, so hopefully you’re excited and eager to get started :)

Well, all you need is a Snowflake account and your favorite IDE like Jupyter Notebook, Visual Studio Code, or any of these other options presented by

.

Getting Started QuickStart Guide

Then, follow along this QuickStart Guide. That’s all!

By the end, here’s the Streamlit app you will have built.

Once you’ve completed it, please share what you think and any other feedback that you may have.

That’s it for now!

Thanks for your time, and follow me on Twitter and LinkedIn where I share demo videos, code snippets, quickstart guides, and other educational material specifically around Snowpark. Also reach out to me if you’d like the Snowflake Bear with Python sticker :)

--

--

Dash Desai
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Lead Developer Advocate @ Snowflake | AWS Machine Learning Specialty | #DataScience | #ML | #CloudComputing | #Photog