How to Create a Complex Query with Snowpark DataFrame in Python

A while ago, when Snowpark for Python was just in private preview, I wrote about creating a complex query with Snowpark DataFrame, but in Scala.

Snowpark is simply a Snowflake library that can be downloaded and used from Scala, Java or Python client applications to push and execute SQL code in the virtual warehouse, closer to data.

Just like in Apache Spark, its main class is DataFrame, which allows you to construct SQL queries using a functional-style fluent dot notation. There is no secret Snowflake tries to simplify the way Apache Spark uses a data lake for all sorts of advanced data science projects, especially machine learning. Unlike Snowflake, Spark is not a data warehouse but can use all sorts of blob storage like Hadoop’s HDFS, Amazon’s S3 and so on. Snowflake expanded its focus area into the data lakes and tries to become a central administration hub to manage a data lakehouse (e.g. a data warehouse — Snowflake in this case — plus a data lake like S3).

I’ll simply translate the old DataFrame-based Scala code to Python here, and check my full GitHub repository with the full project.

My Complex SQL Query

In my previous post, I tried to emulate a rather complex SQL query using the SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL

--

--

Cristian Scutaru
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

World-class expert in Snowflake Data Cloud. Former Snowflake "Data Superhero". SnowPro SME (Subject Matter Expert). 5x SnowPro certification exams.