Building Cool Applications with Streamlit and Snowflake

Published in

CodeX

7 min readApr 21, 2022

Streamlit is (recently announced) acquired by Snowflake, which will add capabilities to allow data engineers, data scientists, and developers to interact with the data and build apps that bring the data to life for non-technical/business users. Streamlit is an open-source application development framework in Python. Streamlit has data visualization which tremendously helps the Snowflake community and partners to build apps and visualize data that exists in Snowflake and Snowflake Marketplace.

The purpose of this blog is to show you how to build cloud-native applications with data visualization using Streamlit and Snowflake. Streamlit applications can be deployed in the Streamlit Cloud and shared, as well as it can run in compute in Cloud, servers, and laptops.

Here is the architecture of Snowflake with Streamlit

Initial Setup

Access to the Snowflake account: make sure you have a Snowflake account and credentials to log in, if not, you can sign up for free trial.
Use Snowflake Quick Start, a self-paced, guide Zero to Snowflake. If you want an instructor-led virtual hands-on lab, you can sign up here. We will be using data created from this lab.
Install snowsql : We will use it to upload sample data into the snowflake table
Install Conda: Conda is an open-source package management and environment management system.
Create a new Conda environment

conda create -n snowflake python=3.8
conda activate snowflakepip install streamlit
pip install snowflake-sqlalchemy
pip install streamlit-aggrid
pip install graphviz
pip install st-annotated-text

5. Create a directory and use it, where you keep all your code

mkdir streamlitapp
cd streamlitapp

6. Create a credential file “creds.json” with the following format: Please secure this file as it has snowflake credentials and do not copy it into your GitHub account, If you deploy the app to the Stereamlit cloud, use a Streamlit Secret to store the credential

{
“account”: “snowfalkeaccount”,
“user”: “username”,
“password”: “password”,
“warehouse”: “warehousename”,
“database” : “citibike”,
“schema” : “public”,
“role”: “sysadmin”
}

If you deploy app on Streamlit Cloud and use Streamlit Secret then use the following python code to connect to Snowflake:

def create_session():    
       conn = snowflake.connector.connect(**st.secrets["snowflake"])          
       return conn

Building Application

Import libraries

import sys
import streamlit as st
import json
import pandas as pd 
import snowflake.connector
from datetime import datetime
import datetime as dt
import pytz
#  map chart
import pydeck as pdk# for data frame tables display
from st_aggrid import AgGrid as stwrite
from st_aggrid.grid_options_builder import GridOptionsBuilder
# for role chart
import graphviz as graphviz
#annoated text
from annotated_text import annotated_text as atext

2. Connecting to Snowflake

def create_session():
    with open('creds.json') as f:
        cp = json.load(f)
    conn = snowflake.connector.connect(
                    user=cp["user"],
                    password=cp["password"],
                    account=cp["account"],
                    warehouse=cp["warehouse"],
                    database=cp["database"],                    
                    role=cp["role"],
                    schema=cp["schema"]
                    )
     
    return conncurr_sess = create_session()

3. Visualizing Snowflake Time-travel and Zero Copy Clone feature

Snowflake offers Time-Travel feature that allows you to access data from any point in the past. The purpose of this module is to show how data scientists can easily go back in time and run the model against that data.

asof_time = st.slider(
            "When do you want to go back in time for table "+curr_table,
            value=end_date,
            max_value=end_date,
            min_value=table_created,
            step=dt.timedelta(minutes=1),
            format="MM/DD/YY - HH:mm")

Here st.slider draws a line between min_value and max_value based on date, and it moves on the increment of 1 minute, once scrolled, the value will be returned. Use this value to dynamically create a SQL and run it to get and display data.

You might have noticed in the above image the data display using AgGrid library, which allows end-users to interactive way to sort data, group by, filter, etc on the screen.

You can use this for other use cases, I had also use a slider to change the size of the warehouse:

Change Warehouse size

4. Visualizing Snowflake Roles in the hierarchy

Snowflake provides Role-based Access Control (RBAC) as a mechanism for granting or denying access to various database objects and actions to security principals such as users, services, etc. Many customers use role hierarchy to control granular access to the snowflake objects. This part of the module helps you to show the Role Hierarchy to the security compliance office and understand how the access is given to the user. Blue shaded roles as Snowflake system-defined role, Red is system-defined ACCOUNTADMIN role, and white shapes are custom roles.

Here is the code snippet to build this chart:

rolechart = graphviz.Digraph()
            rolechart.attr("node", shape="doublecircle")
            rolechart.attr("node", color="#11567f")  
            rolechart.attr( rankdir="BT")  
            rolechart.attr( "node", fontsize="6pt")            
            for num, row in rdf.iterrows():
                if row["CHILD"] == 'ACCOUNTADMIN':
                    rolechart.edge(row["CHILD"], row['PARENT'],color="red", arrowsize="3",size="double")
                else:
                    rolechart.edge(row["CHILD"], row['PARENT'])
            rolechart.node("ACCOUNTADMIN",style="filled",fillcolor="red",fontcolor="white")                    
            
st.graphviz_chart(rolechart)

Graphviz library allows customizing charts the way you like it. edge element allows to create branches of the chart, which is in the for loop; rankdir attribute to allow chart from top to bottom, right to left, etc; other customizations are self-explanatory.

5. Visualizing data in a map

Streamlit allows visualizing data in various charts, including pydeck maps. Often time you want to show data on the map. From our example in the “Zero to Snowflake” quickstart, we have citibike data and want to see which are the most popular routes visually on the map. See below:

The above chart is created by loading the station information which is publicly available at https://gbfs.citibikenyc.com/gbfs/en/station_information.json in json format. You can load this data in snowflake and join it with Trip data from the workshop. Here is the code you need to run. Download station data from the above link into your local directory and use snowsql to upload the data and make it ready for the chart.

$ snowsql -a <snowflakeaccount>
## input your username and password and run following commandsuse database citibike;
use schema public;
create stage citibike_stage;
create or replace file format json_format type = 'json';put file://station_information.json @citibike_stage;ls @citibike_stage;create or replace table station_data as
  with s as (
        select parse_json($1) payload from  @citibike_stage (file_format=>json_format)
   )
  select value station_v
  from s, lateral flatten (input => payload:data.stations);select * from station_data;create or replace view stations_vw as
   select station_v:station_id::number station_id,
    station_v:name::string station_name,
    station_v:lat::float station_lat,
    station_v:lon::float station_lon,
    station_v:station_type::string station_type,
    station_v:capacity::number station_capacity,
    station_v:rental_methods rental_methods
  from station_data ;select * from stations_vw;create or replace view trips_stations_vw as (
  with
    t as (select * from trips),
    ss as (select * from stations_vw),
    es as (select * from stations_vw)
  select starttime, stoptime, start_station_id,
    ss.station_name start_station,ss.station_lat start_lat, ss.station_lon start_lon,
    end_station_id, es.station_name end_station,
    es.station_lat end_lat, es.station_lon end_lon,
    bikeid, usertype, birth_year, gender
  from t left outer join ss on t.start_station_id = ss.station_id
    left outer join es on t.end_station_id = es.station_id
  );

In the python code, create a pydeck chart.

def exec_sql(sess, query):
    try:
        df=pd.read_sql(query,sess)
    except:
            st.error("Oops! ", query, "error executing ", sys.exc_info()[0], "occurred.")
    else:
        return df
    return###########df=exec_sql(curr_sess,'  select  \
          any_value(start_station) from_station_name, any_value(end_station) end_station_name, start_lat, start_lon, end_lat, end_lon, \
          count(*) num_trips, \
          avg(datediff("minute", starttime, stoptime))::integer avg_duration_mins \
          from citibike.public.trips_stations_weather_vw \
          where    \
          (start_lat is not null and end_lat is not null)  \
          and start_lat <> end_lat  \
          group by  start_lat, start_lon, end_lat, end_lon  \
          order by num_trips  desc \
         limit 20;')    
         
    lay1 = pdk.Layer(
            "ArcLayer",
            data = df,
            get_source_position=["START_LON", "START_LAT"],
            get_target_position=["END_LON", "END_LAT"],
            #get_source_color=[200, 30, 0, 160],
            #get_target_color=[200, 30, 0, 160],
            get_source_color=[64, 255, 0],
            get_target_color=[0, 128, 200],
            auto_highlight=True,
            width_scale=0.0004,
            get_width="NUM_TRIPS",
            width_min_pixels=3,
            width_max_pixels=30,
        )
    lay2 = pdk.Layer(
            "TextLayer",
            data=df,
            get_position=["START_LON", "START_LAT"],
            get_text="FROM_STATION_NAME",
            get_color=[0, 0, 0, 200],
            get_size=15,
            get_alignment_baseline="'bottom'",
        )    
    lay3 = pdk.Layer(
            "HexagonLayer",
            data=df,
            get_position=["START_LON", "START_LAT"],
            radius=200,
            elevation_scale=4,
            elevation_range=[0, 1000],
            extruded=True,
        )    
    st.pydeck_chart(pdk.Deck(
            map_style="mapbox://styles/mapbox/light-v9",
            initial_view_state={"latitude": 40.776676,
                                "longitude": -73.971321, "zoom": 11, "pitch": 50},
            layers=[lay1,lay2, lay3],
            tooltip={"text": "{FROM_STATION_NAME} to {END_STATION_NAME}"}
        ))    
    stwrite(df)

We have created three layers:

The first layer is to display the arc from one map point to another, it uses column START_LON and START_LAT columns from the dataframe. NUM_TRIPS column gives you the width of the arc, based on its value.

The second layer is to display the station name, here we are displaying the start station name.

The third layer is to display a hexagon, that shows which station is popular.

The “initial_view_state” parameter is used to zoom the map to New York City, based on its longitude and latitude.

Tips:

I have used annotated text library to display the snowflake environment context like below:

def write_env(sess):
    df=exec_sql(sess,"select  current_region() region, current_account() account, current_user() user, current_role() role, current_warehouse() warehouse, current_database() database, current_schema() schema ")
    df.fillna("N/A",inplace=True)    
    csp=df.at[0,"REGION"]
    cspcolor="#ff9f36"
    if "AWS"  in csp :
        cspcolor="#FF9900"
    elif "AZURE" in csp:
        cspcolor = "#007FFF"
    elif "GCP" in csp:
        cspcolor = "#4285F4"    
    atext((csp,"REGION",cspcolor)," ",
          (df.at[0,"ACCOUNT"],"ACCOUNT","#2cb5e8")," ",
          (df.at[0,"USER"],"USER","#afa" )," ",
          (df.at[0,"ROLE"],"ROLE", "#fea"), " ",             
          (df.at[0,"WAREHOUSE"],"WAREHOUSE","#8ef"), " ",               
          (df.at[0,"DATABASE"],"DATABASE"), " ",         
          (df.at[0,"SCHEMA"],"SCHEMA"),                           
          )
....
with st.sidebar
     st.write_env(curr_sess)

One more tip…You can add st.snow() to make it snow animation on the page.

Putting all together:

Use the following code to build your own app.

Click on “view raw” to copy the code, and paste it into the file snowapp.py

Running Streamlit Application

Once your code is ready, you are ready to run the streamlit application, make sure you have snowapp.py and creds.json files in the same directory to run with this code.

streamlit run snowapp.py

Conclusion:

Building application using Streamlit on Snowflake is extremely easy which means the ability to adapt and to learn quickly is the key capability for the strategic management of release timing and faster time to market.

Additionally, this application can be easily consumed by internal and external audiences and can be shared as well.

Resources:

Streamlit Sandbox — https://share.streamlit.io/samdobson/streamlit-sandbox/main/app.py
Streamlit Cheatsheet — https://docs.streamlit.io/library/cheatsheet
Streamlit in 30 days — https://share.streamlit.io/streamlit/30days
My colleague Dash Desai has written an article to develop an application in Streamlit using Snowpark here

Happy Coding!

Disclaimer: The opinions expressed in this post are my own and not necessarily those of my employer.