Getting Started with Snowpark for Python and Streamlit

Getting Started with Snowpark for Python and Streamlit

UPDATE: As of September 18, 2023 Streamlit in Snowflake is in Public Preview. This means you can now build this application entirely in Snowflake. For the updated code and step-by-step instructions, please follow this QuickStart Guide.

UPDATE: As of Nov 7, 2022 Snowpark for Python is GA.

Recently Snowflake announced entering into an agreement to acquire Streamlit in order to democratize writing and sharing data applications. In this blog, we will review how to build a data application using Snowpark for Python and Streamlit.

What is Snowpark?

Snowpark at its core provides an API that developers can use to construct DataFrames that are executed lazily on Snowflake’s platform. It enables data engineers, data scientists, and developers coding in languages other than SQL such as Scala, Java and Python to take advantage of Snowflake’s powerful platform without having to first move data out of Snowflake. This enables data application developers to run complex transformations within Snowflake while taking advantage of the built-in unlimited scalability, performance, governance and security features. Learn more about Snowpark here.

What is Streamlit?

Streamlit is a pure-Python open-source application framework that enables developers to quickly and easily write, share, and deploy data applications. Learn more about Streamlit here.

Let’s Build Snowpark for Python and Streamlit application!

Ok, let’s get to the meat of this blog and write a data application using Snowpark for Python and Streamlit.

Prerequisites

  • Snowpark for Python
    - You can install it by running pip install snowflake-snowpark-python
  • Streamlit
    - You can install it by running pip install streamlit
  • Dataset
    - One of the many benefits of using Snowflake is that you can leverage the extensive Snowflake Data Marketplace. It provides 100s of ready-to-query third-party datasets.
    - For this guide, we’ll use the Environment Data Atlas dataset provided (for free) by Knoema. In the marketplace, click on Get Data and follow the instructions to gain access to KNOEMA_ENVIRONMENT_DATA_ATLAS.
    - In particular, we will analyze data in schema ENVIRONMENT from tables EDGARED2019, WBWDI2019Jan, and UNENVDB2018.

Step-by-Step Guide

  1. Import the required libraries

2. Create a Session object to connect to your Snowflake account. Here’s a quick way of doing that, but note that hard coding credentials directly in code is not recommended in production environments. In production environments a better approach would be to load credentials from AWS Secrets Manager or Azure Key Vault, for example. (If you’re looking for sample code, look no further — AWS Secrets Manager, Azure Key Vault.)

In the above code snippet, replace variables enclosed in “<>” with your values.

3. Create three Snowpark DataFrames to load data from tables EDGARED2019, WBWDI2019Jan, and UNENVDB2018 from schema ENVIRONMENT.

In the above code snippet, we’re leveraging several Snowpark DataFrame functions to load and transform data. For example, filter(), groupBy(), agg(), sum(), alias() and sort().

More importantly, note that at this point nothing is executed on the server because of lazy evaluation–which reduces the amount of data exchanged between Snowflake and the client/application.

4. When working with Streamlit we need to provide Pandas DataFrame and luckily for us :) Snowpark for Python exposes a method to convert Snowpark DataFrames to Pandas. Awesome!

As mentioned above, the Snowpark DataFrames are lazily evaluated, which means the SQL statement is not sent to the server for execution until an action is performed on it. An action, for example toPandas() in our case, causes the DataFrame to be evaluated and sends the corresponding generated SQL statement to the server for execution.

5. At this point, you’re technically done with most of the code and all you need to do to render the data in a web application in your browser is to use Streamlit’s dataframe() API. For example, st.dataframe(pd_df_co2).

But let’s add a few more web components to make our data application a bit more presentable and interactive :)

Let’s add a header and sub-header and also use containers and columns to organize our dataframes using Streamlit’s columns() and container().

Let’s also display an interactive bar chart.

In the above code snippet, a bar chart is constructed using Streamlit’s bar_chart() which takes a dataframe as one of the parameters. In our case, that is a subset of the CO2 Emissions by Country dataframe filtered by column Total CO2 Emissions via Snowpark DataFrame’s filter() and user-defined CO2 emissions threshold set via Streamlit’s user input component number_input().

To put it all together…

For example, in my_snowpark_streamlit_app.py

The fun part! Assuming your Python script (as shown above) is free of syntax and connection errors, you’re ready to run the application. This can be done by running the following at the command line.

streamlit run my_snowpark_streamlit_app.py

If all goes well, you will see the following in your browser:

Getting Started with Snowpark for Python and Streamlit

A few cool things to note:

  • You can change the theme (light or dark) by clicking on the hamburger menu on the top right and then clicking on the Settings menu.
  • Making any changes to the source script and saving it will automatically prompt you to Rerun the application in the browser without having to stop and restart the application at the command line.
  • You can either manually enter a different emissions threshold or increase/decrease the value by clicking on -/+ buttons to change the data visualization.

That’s a wrap

Thanks for your time! Connect with me on Twitter and LinkedIn where I share demos, code snippets, QuickStart Guides, and other interesting technical artifacts. Be sure to also check out Snowflake For Developers.

UPDATE: As of Nov 7, 2022. Snowpark for Python is GA.

UPDATE: As of September 18, 2023 Streamlit in Snowflake is in Public Preview. This means you can now build this application entirely in Snowflake. For the updated code and step-by-step instructions, please follow this QuickStart Guide.

--

--

Dash Desai
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

Lead Developer Advocate @ Snowflake | AWS Machine Learning Specialty | #DataScience | #ML | #CloudComputing | #Photog