Building your first Streamlit app for non-python users
I regularly write about modern data platforms and technology trends. To read my future articles simply join my network here or click ‘Follow’. Also feel free to connect with me via YouTube.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
What is Streamlit?
In March 2022 Snowflake acquired Streamlit to allow customers to build data-driven applications as they noticed their customers and employees had started to use it on top of data in Snowflake. The price? $800 million.
Streamlit is an open-source Python library used for creating web applications in a fast and simple way. It allows data scientists and machine learning engineers to create interactive data applications quickly without requiring expertise in web development.
Streamlit provides an intuitive API for creating interactive web applications with custom UI components such as sliders, dropdowns, and text inputs. It also enables users to easily integrate data visualizations, plots, and charts into their applications.
The library is designed to work seamlessly with popular data science and machine learning libraries such as NumPy, Pandas, and Scikit-learn. Streamlit makes it easy to prototype and share data-driven applications and insights with others without requiring complex infrastructure or deployment procedures.
But just how difficult is it to create your first application in Streamlit for non-python users like myself? In this article we find out.
Getting up and running
The first thing we need to do is get Streamlit up and running. If you head to streamlit.io you’re able to sign up for the community version at which point you’ll need to hook it up to your github repository. If you don’t have one, then you’ll also need to create a new github account.
You can also set Streamlit up to run in a virtual environment using tools you may be familiar with such as Anaconda. But, this is where I see users unfamiliar with developing in python and using these applications quickly get lost and give up.
We create a new or use an existing Github repository and place a python file in there. In our Streamlit dashboard we click ‘New App’ and point the app to our repository, branch and (empty) python file. Check out the video below to get a better idea of what the steps look like to set this up.
Development flow
In this demo I simply edit a source python file (.py) in github and view changes in the Streamlit app in another web browser tab. It doesn’t get more straightforward than that as a setup.
Every time you want to update your app, save and commit the source file. When you do that, Streamlit detects if there is a change and, in the setup we’re using, the app will automatically refresh.
This allows you to work in a fast interactive loop: you type some code, save it, try it out live, then type some more code, save it, try it out, and so on until you’re happy with the results. This tight loop between coding and viewing results live is one of the ways Streamlit makes your life easier.
Data Flow
Streamlit apps have a unique data flow: any time something must be updated on the screen, Streamlit reruns the entire Python script from top to bottom.
This can happen in two situations:
- Whenever you modify your app’s source code.
- Whenever a user interacts with widgets in the app. For example, when dragging a slider, entering text in an input box, or clicking a button.
This is handled for you by Streamlit behind the scenes, although there are some impressively easy ways we can enhance the efficiency of some steps. For example, we might want to skip over some steps such as going off to the database and retrieving data again if the data hasn’t changed. We’ll cover that later.
Building our application
Rather than bore you with the key concepts, let’s do what I did and dive into a Streamlit tutorial to build our first application to get a feel for the level of complexity!
To kick things off we open our python file and import the libraries we need:
import streamlit as st
import pandas as pd
import numpy as np
And we give the app a title:
st.title('Uber pickups in NYC')
When we commit this file to our github repository our app updates automatically in our other browser window where we can view the changes.
Loading the data
Next comes a bit of python. The tutorial provides this which is handy as starting from scratch would have taken me a bit of gooling to work this out. I thought it was actually pretty easy to read the code to work out what was happening here:
DATE_COLUMN = 'date/time'
DATA_URL = ('https://s3-us-west-2.amazonaws.com/'
'streamlit-demo-data/uber-raw-data-sep14.csv.gz')
def load_data(nrows):
data = pd.read_csv(DATA_URL, nrows=nrows)
lowercase = lambda x: str(x).lower()
data.rename(lowercase, axis='columns', inplace=True)
data[DATE_COLUMN] = pd.to_datetime(data[DATE_COLUMN])
return data
We can see that the data is being loaded from a s3 file and there’s some formatting taking place. Note that there’s an input variable which takes a number to limit the number of records being returned.
Next we load the data by calling the function and wrap it in a couple of text comments to infom the user of the application of the status of the data load process.
# Create a text element and let the reader know the data is loading.
data_load_state = st.text('Loading data...')
# Load 10,000 rows of data into the dataframe.
data = load_data(10000)
# Notify the reader that the data was successfully loaded.
data_load_state.text('Loading data...done!')
Where does this data go you might be asking? Well it’s held in a Pandas dataframe. A dataframe can be thought of like an in-memory table which can be queried, sorted and filtered in a similar way to a table. No doubt better descriptions exist but that’s all you need to know for now.
So far we’ve loaded 10,000 rows of data from a s3 file, applied some formatting and placed it into a pandas dataframe.
Caching
Before we go any further we want to make sure we’re not retrieving the same 10,000 rows of data each time the app refreshes. Ideally we want to cache that data locally if possible. Just before the data_load procedure we can add the following line of code:
@st.cache_data
def load_data(nrows):
Done. That’s it. The ease of applying something as common and effective as caching in Streamlit was a real eye opener for me.
When you mark a function with Streamlit’s cache annotation, it tells Streamlit that whenever the function is called that it should check two things:
- The input parameters you used for the function call.
- The code inside the function.
If this is the first time Streamlit has seen both these items, with these exact values, and in this exact combination, it runs the function and stores the result in a local cache. The next time the function is called, if the two values haven’t changed, then Streamlit knows it can skip executing the function altogether. Instead, it reads the output from the local cache and passes it on to the caller — like magic.
To get some visibility of this replace this line of code:
data_load_state.text('Loading data...done!')
With this:
data_load_state.text("Done! (using st.cache_data)")
There are of course some nuances to this, so worth checking out the documentation to understand more about this feature.
Writing out the raw data
It might also be convenient to see the raw data we’re working with on the app during the development stages.
We can use the st.write command which can basically render almost anything which is passed to it.
st.subheader('Raw data')
st.write(data)
In this case, we’re passing in a dataframe to the write command to render the data as an interactive table.
Drawing a historgram
Next we want to plot the number of pick ups by hour to look at the lowest and peak demand hours. First we add a subheader before using our data to generate a histogram that breaks down pickup times binned by hour. Note: this is using the NumPy library we imported at the beginning of our script.
st.subheader('Number of pickups by hour')
hist_values = np.histogram
data[DATE_COLUMN].dt.hour, bins=24, range=(0,24))[0](
st.bar_chart(hist_values)
To draw this diagram we used Streamlit’s native bar_chart() method, but it’s important to know that Streamlit supports more complex charting libraries like Altair, Bokeh, Plotly, Matplotlib and more. For a full list, see supported charting libraries.
Plotting data on a map
This was something I’ve done previously over the years in a range of data visualisation tools, more recently in Tableau and Power BI. I was pleasantly suprised to see how quick and easy this was to do in Steamlit in comparison — check this out!
st.subheader('Map of all pickups')
st.map(data)
That’s it. Pass the data to the map function and it automatically works out what data points to use.
Filtering data
We can now clearly see the peak demand in our pickup data in 17:00. But what if we wanted to filter our data by that time?
We replace these lines:
st.subheader('Map of all pickups')
st.map(data)
With this:
hour_to_filter = 17
filtered_data = data[data[DATE_COLUMN].dt.hour == hour_to_filter]
st.subheader(f'Map of all pickups at {hour_to_filter}:00')
st.map(filtered_data)
And the data is updated instantly. But this is hard coded in the data. These apps are supposed to be interactive so the data consumer can draw their own insights, right? So let’s add that interactivity with a slider.
To do this we’ll change the hour_to_filter line of code to this:
hour_to_filter = st.slider('hour', 0, 23, 17) # min: 0h, max: 23h, default: 17h
Using the slider we can now watch the map update in real time. How cool and easy is that?
Toggling Data
Finally, we might not want the raw data cluttering up our app during the development phase. We’ll decide to hide it using a checkbox so we can easily view the raw data within the app, but only when we need to.
To do this we’ll replace these lines with:
st.subheader('Raw data')
st.write(data)
With:
if st.checkbox('Show raw data'):
st.subheader('Raw data')
st.write(data)
Now we have a checkbox to show or hide the raw data table!
Summary
Coming from a non-python development background I didn’t want to spend time getting into the complexities of managing different libraries as well as installing Anaconda on my machine to set up a virtual environment just to experiment with Streamlit.
I wanted something like Snowflake. Something I could fire up and start getting value out of without configuration and installation. I am so pleased Streamlit absolutely fits this mould, and you can see now why it was so appealing Snowflake acquired it.
If you are interested in giving this a go I’d encourage you to check out the YouTube video below where I have also made available the link to the tutorial as well as the complete code used.
I now plan to create a couple more videos in this series where I start to build Streamlit apps based on data within Snowflake, as well as introducing a Snowpark into the mix and looking at some writeback use cases.
Thanks for reading!
To stay up to date with the latest business and tech trends in data and analytics, make sure to subscribe to my newsletter, follow me on LinkedIn, and YouTube, and, if you’re interested in taking a deeper dive into Snowflake check out my books ‘Mastering Snowflake Solutions’ and ‘SnowPro Core Certification Study Guide’.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
About Adam Morton
Adam Morton is an experienced data leader and author in the field of data and analytics with a passion for delivering tangible business value. Over the past two decades Adam has accumulated a wealth of valuable, real-world experiences designing and implementing enterprise-wide data strategies, advanced data and analytics solutions as well as building high-performing data teams across the UK, Europe, and Australia.
Adam’s continued commitment to the data and analytics community has seen him formally recognised as an international leader in his field when he was awarded a Global Talent Visa by the Australian Government in 2019.
Today, Adam works in partnership with Intelligen Group, a Snowflake pureplay data and analytics consultancy based in Sydney, Australia. He is dedicated to helping his clients to overcome challenges with data while extracting the most value from their data and analytics implementations.
He has also developed a signature training program that includes an intensive online curriculum, weekly live consulting Q&A calls with Adam, and an exclusive mastermind of supportive data and analytics professionals helping you to become an expert in Snowflake. If you’re interested in finding out more, visit www.masteringsnowflake.com.