Magic Data Apps with Snowflake, Streamlit, and DuckDB
This pattern will allow you to create high-traffic apps available to the world, hosted by Streamlit Cloud. Play and prepare your data with Snowflake and Streamlit, and then serve the results in production out of static Parquet files.
Snowflake acquired Streamlit for a good reason: Streamlit adds magic to the process of creating a data app. Connecting Streamlit to Snowflake is straightforward , and my teammate Dash Desai showed how to play with Snowpark for Python (or just use the Streamlit Snowflake connector)
Getting Started with Snowpark for Python and Streamlit
Recently Snowflake announced entering into an agreement to acquire Streamlit in order to democratize writing and…
Connect Streamlit to Snowflake - Streamlit Docs
This guide explains how to securely access a Snowflake database from Streamlit Cloud. It uses the…
Some data apps don’t need a full fledged database backend, but only a way to quickly present analytical results out of static data that has been pre-optimized. If you want to release a data app that’s expected to have high traffic, no writes, and low personalization/security requirements — this pattern might be for you.
In summary, the pattern in this post will show you how easy it is to:
- Prepare data in Snowflake, and export it as Parquet files to cloud storage buckets.
- Read these Parquet files in Streamlit Cloud, from your private cloud storage bucket.
- Query these Parquet files in Streamlit, using DuckDB.
Step 1: Create a storage bucket
For this post, I chose Google Cloud Storage as it lives close to the current Streamlit Cloud servers. My Snowflake account lives in AWS, but it can easily export files to GCS. Streamlit can also read files out of S3, but here GCS was my choice.
Once we create a GCS bucket we need to give it write permission from Snowflake and read permissions to Streamlit Cloud. Those are the next two steps.
Step 2: Export data from Snowflake to your bucket
Creating a GCS integration in Snowflake is straightforward:
use role accountadmin;create storage integration fhoffa_lit # arbitrary name
type = external_stage
storage_provider = gcs
enabled = true
storage_allowed_locations = ('gcs://fhoffa-lit/');
desc storage integration fhoffa_lit;
desc storage integration will give you the id of a GCP service account that Snowflake will create for you. Now you only need to tell GCP to give that service account writing privileges to your bucket.
Find your bucket and “edit access”:
Add the id of the service account that Snowflake provisioned with “Storage Object Admin” permissions (or less if desired):
With this, we can export tables or the results of any query to that storage integration in Snowflake. First create a stage and make sure that you can read all the files already in this bucket:
create stage fhoffa_lit_stage
url = 'gcs://fhoffa-lit/'
storage_integration = fhoffa_lit;list @fhoffa_lit_stage;
Exporting the results of a query as a parquet file to GCS is straightforward now:
copy into @fhoffa_lit_stage/out/parquet_
select 1 a, 2 b, 3 c
file_format = (type = 'parquet')
header = true;
Step 3: Read in Streamlit from your bucket
Now that we have a Parquet file in GCS, download it within Streamlit Cloud. First, create a service account in GCP, and share it with Streamlit Cloud using its secrets manager:
I know I’ve used this word several times already, but now reading a CSV out of GCS into Streamlit is… straightforward:
import streamlit as stfrom google.oauth2 import service_account
from google.cloud import storagecredentials = service_account.Credentials.from_service_account_info(
client = storage.Client(credentials=credentials)bucket = "fhoffa-lit"@st.experimental_memo(ttl=600)
def read_file(bucket_name, file_path):
bucket = client.bucket(bucket_name)
return bucket.blob(file_path).download_as_string().decode("utf-8")content = read_file(bucket, "myfile.csv")
Step 4: Read Parquet in Streamlit with DuckDB
Notice that in the example above I read a CSV — not a Parquet file. This is because a Parquet file needs more parsing.
First, instead of just reading the file, take it out of GCS and into Streamlit Cloud local storage:
def download_to_local_file(bucket_name, file_path, destination):
bucket = client.bucket(bucket_name)
bucket, "out/parquet__0_0_0.snappy.parquet", "temp.parquet")
And now you can query that local parquet file with DuckDB:
import duckdbcon = duckdb.connect(database=':memory:')
con.execute("SELECT * FROM read_parquet('temp.parquet')")
rows = con.fetchall()
Ahh, these are the
requirements.txt that Streamlit Cloud will easily install:
That’s it. With this pattern you can:
- Analyze data in Snowflake.
- Tell Snowflake to export Parquet files into GCS.
- Have Streamlit Cloud copy Parquet files from GCS into Streamlit Cloud.
- Query Parquet files in Streamlit Cloud with DuckDB (on the server side).
Find the code and minimal sample app here:
If you haven’t tried Streamlit Cloud, follow this tutorial to deploy your first app.
Deploy an app - Streamlit Docs
Streamlit Cloud lets you deploy your apps in just one click, and most apps will deploy in only a few minutes. If you…
Stay tuned to see what I’m building with this pattern!
- Try this out with a Snowflake free trial account — you only need an email address to get started.
I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn. And subscribe to reddit.com/r/snowflake for the most interesting Snowflake news.