How To Get Insights on Cool Cars Using Streamlit & Snowflake ❄️ In 9 Steps

As a car enthusiast, car dealer and data scientist, I’m interested in understanding how certain cars depreciate over time. I created this app as a MVP to prove a few things to myself:

  1. How easy is it to use Streamlit for the front end?
  2. How easy is it to connect to a database (Snowflake ❄️) and not just flat files?
  3. How easy is it to deploy the app so that it can be shared with others?

Like most Single Page Applications, the app collects a few inputs from the user like year, trim level, expected ownership tenure (in years) and number of miles driven. Given these inputs, it then shows you the relationship between miles and price in a chart— i.e. "line of best fit" imposed upon a scatterplot of cars of that same year and trim level:

In the backend, the app calculates the depreciation for the total expected miles driven by using a polynomial regression curve (see the get_poly_depreciation() function here). This is a quick and simple way to get a sense of the relationship between the two variables, miles and price:

Currently, the app only displays the depreciation curve for a single car trim. But it’s super easy to extend the functionality to compare multiple trim levels. Just change st.selectbox() to st.multiselect().

In this post, I’ll show you how to deploy your app successfully, along with your connection to Snowflake ❄️. We’ll do it in 9 steps:

  • Steps 1–4: How to build and deploy locally
  • Steps 5–7: How to create a new repo on GitHub to push your local code
  • Step 9: How to deploy to the Streamlit Community Cloud! 🚀

Want to skip reading? Here’s the live app and my public GitHub repo!

How to build and deploy locally

Step 1

To create a virtual environment from the terminal, navigate to the working directory of your newly created project folder and run the following command:

pip install virtualenv

virtualenv stenv

pip install streamlit==1.22.0 pandas numpy seaborn matplotlib plotly snowflake-connector-python ipywidgets statsmodels pyarrow==10.0.1

Step 2

To store your database credentials securely and avoid exposing them in a public repository, create a file named secrets.toml :

Create a folder named “.streamlit” in the root of your working directory:

# .streamlit/secrets.toml

[snowflake]
account = "xxx"
user = "xxx"
password = "xxx"
role = "xxx"
warehouse = "xxx"
database = "xxx"
schema = "xxx"
client_session_keep_alive = true

Note: You will ignore this file when pushing your code to your repo in Step 8:

Step 3

After installing all the required libraries, import the following libraries in the first part of your main app script:

import pandas as pd
import numpy as np
import snowflake.connector
import streamlit as st
import ipywidgets
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import plotly.express as px

In the code snippet below we have our connection to Snowflake ❄️. Streamlit offers a clean way of securing them by storing these values in a separate toml file which we created in step 2 above (read more about secret management here):

# Initialize connection.
# Uses st.cache_resource to only run once.
@st.cache_resource
## your credentials are separately stored in secrets.toml
def init_connection():
return snowflake.connector.connect(**st.secrets["snowflake"])

conn = init_connection()
# Perform query.
# Cache data permanently unless the query changes!
# i.e. lower $$ for data requests from snowflake!
@st.cache_data(persist="disk")
def run_query(query):
with conn.cursor() as cur:
cur.execute(query)
return cur.fetch_pandas_all()

Now that we have our connection to Snowflake ❄️ set up, we can start creating our UI. Below is the code for creating some clean-looking drop-downs:

#Streamlit's power lies in the ability for data scientists to quickly build a 
#beautiful front end using very few lines of code...

## create inputs...
st.markdown('Select a year:')
options_yr = st.selectbox('year',
[2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
label_visibility="hidden")

Here is the code for creating interactive visualizations using Plotly:

## Create a scatter plot using a dataframe named "df_temp"
## I can define the template, size, and even add a trendline
## which I will use in a function later to get the depreciation
## cost per X thousand miles
fig = px.scatter(df_temp,
x="MILES",
y="PRICE",
color="TRIM",
template='plotly',
width=600, height=400,
trendline="ols",
trendline_options=dict(log_x=True))

fig.layout.title = 'Scatterplot With Trendline By Trim Level for a ' + str(options_yr) + ' Porsche 911'
# Plot!
st.plotly_chart(fig, use_container_width=True)

Step 4

Run your application locally using the following terminal command:

streamlit run <<yourfilename>>.py

Congratulations! You just deployed your app locally!

Now let’s push your local code to GitHub and get it published in the Streamlit Community Cloud.

The best way to keep everything in sync is to use source control. This allows you to make changes locally, merge them with your existing online code, and see those changes get updated in your live app. This is a part of what’s called the CI/CD process (learn more here).

How to create a new repo on GitHub to push your local code

Step 5

To ensure a successful deployment, you need to create Dockerfile and requirements.txt files. Streamlit Community Cloud requires a Dockerfile to provide all the commands necessary to gather the dependencies required to run your app (learn more here).

Simply add these files to your local project folder. You can copy my Dockerfile from here and requirements.txt from here.

Step 6

To complete the deployment, you’ll need a GitHub account. If you don’t already have one, you can create one here.

Step 7

Now you’ll need to create a new repository. Be sure to give it a name. You’ll use its URL in the next step to push your code locally to GitHub.

Step 8

To sync your local project directory with your repo, there are a few things to consider. First, you don’t want your secrets.toml file to be uploaded to your public repo. Instruct git to ignore it:

## we want to exclude everything in the .streamlit folder (i.e. secrets.toml)
echo ".streamlit/" >> .gitignore

Next, run the following commands to synchronize your local project folder with the newly created repo (read more about managing GitHub repos here):

## your terminal should be running from your project folder
## set up the project folder for syncing with source control
git init
## add all the files from your local folder which you want to commit
## while at the same time ignoring the files in .gitignore
git add --all --ignore-errors .
## commit the files
git commit -m "initial commit of all files and ignoring the .streamlit folder"
## connect your local repo to your GitHub repo,
## where URL is the full URL to your repo
git remote add origin [URL]
## On this step, you will be required to enter your 
## username and password.
## Its possible the password may not work,
## and you will need to set up a personal access token
## see below for how to create the personal access token
git push -u origin main

Deploy to the Streamlit Community Cloud! 🚀

Step 9

Go to the Streamlit Community Cloud and click on “Get Started”. Use your Gmail or GitHub account to login. I used my GitHub account because it allows me to directly connect to my repository, which is what you will want to do!

Once logged in, you’ll see the following:

To create a new app, click on “New app” in the top right-hand corner. This will bring up a screen that asks for information related to your GitHub repository. In the “repository” field, paste the URL of your repository. The main file path is typically named app.py or whatever you named it during local development:

Once you’ve filled in the necessary fields, click on “Advanced settings”. Here you’ll need to input your “secrets”, which consist of the Snowflake ❄️ DB credentials. You don’t want these items to be public, but your app still needs access to them to connect to Snowflake ❄️. There is a configuration setting that allows you to store them in the Community Cloud:

Once that is complete, you should see this…

If you’ve made it this far, congratulations! You’re 95% done! 🥳

The remaining 5% is contingent on whether or not your deployment fails. If it does, the logs will be your best friend (pay attention to warnings as well).

When I tried deploying my app, it failed because I didn’t specify the correct version of the pyarrow library. Once I updated it, my app worked perfectly. It could return the results of my SQL query from my table in Snowflake and populate the scatterplot with data points.

Here’s how my requirements.txt file looks now after the update to the version of pyarrow:

streamlit==1.22.0
pandas
numpy
seaborn
matplotlib
plotly
snowflake-connector-python
ipywidgets
statsmodels
pyarrow==10.0.1

Wrapping up

I hope this gave you a comprehensive overview of how to get your Streamlit app deployed along with your connection to Snowflake ❄️. If you have any questions, please post them in the comments below or contact me on GitHub or LinkedIn.

--

--