How to create and deploy data exploration web app easily using Python?

Using Streamlit, an app framework for data scientists

6 min readOct 15, 2019

The best and easy way to showcase one’s project is by building a web app that can be easily viewed on the browser. Currently, for every data analysis or ML project visualization and showcasing to others, we are used to using Jupyter notebook. Earlier, if we wanna create our own web-app we need to have a little bit of knowledge of web-development so that we could use frameworks like Dash.

But now there is Streamlit, an app framework specifically for Machine Learning and Data Science engineers for data exploration and showcasing their work.

This post describes how to create a web app using Streamlit and deploy it using Docker.

One can read in detail about the Streamlit architectural concepts here.

Local Development

Install Streamlit

Use pip for installation of Streamlit by the following command :

$ pip install streamlit

Run the following command to check the installation :

$ streamlit hello

After running the above command, you can go to the URL localhost:8501 in your browser and you will see the following screen where you can play around with the demo app.

Now further we will be using one of the Alibaba cloud virtual machine’s data containing features : CPU utilization percentage (cpu_util_percent), memory utilization percentage (mem_util_percent), normalized memory bandwidth (mem_gps), cache miss per thousand instruction (mkpi), normalized incoming network traffic (net_in), normalized outgoing network traffic (net_out), disk I/O percentage (disk_io_percent) for creating a small web app. The header of the data is shown in the below figure :

Some concepts used

Caching: Every time we change something the whole app will run from the beginning to the end. This will be too time-consuming for the big data, therefore streamlit provides a caching mechanism.

We can reuse data with st.cache. The code snippet:

data = st.cache(pd.read_excel)('m_1.xlsx')

st.cache is a data store that lets Streamlit apps safely and effortlessly persist information. For complex and time taking functions we can use @st.cache

@st.cache
def load_data():
    data = pd.read_excel('m_1.xlsx')
    data.drop(['machine_id', 'mem_gps', 'mkpi'], axis=1,  inplace=True)
    lowercase = lambda x: str(x).lower()
    data.rename(lowercase, axis='columns', inplace=True)
    data[DATE_COLUMN] = pd.to_datetime(data['timestamp'])
    return data

When we mark a function with Streamlit’s cache annotation, it checks the byte code, code, variables or files, and the input parameters whenever the function is called.

If this is the first time Streamlit has seen these items, with these exact values, and in this exact combination, it runs the function and stores the result in a local cache. The next time the function is called, if the three values haven’t changed, then Streamlit knows it can skip executing the function altogether. Instead, it reads the output from the local cache and passes it on to the caller. More Information here.

2. Sidebar: This is used for visualization improvement, to have a sidebar. For adding this, just add st.sidebar in the any of the widget’s code which you wanna display in the sidebar. Like to display a checkbox in the sidebar:

st.sidebar.checkbox('Show raw data')

to display a checkbox in the sidebar-

st.sidebar.checkbox('Show raw data')

Widgets Used

We are using the following widgets as part of this development:

Slider: streamlit.slider(label, min_value=None, max_value=None, value=None, step=None, format=None)

This is used to select an hour (hour_to_filter) with this code snippet:

hour_to_filter = st.slider('hour', 0, 23, 17)  # min: 0, max: 23, default: 17

2. Checkbox: streamlit.checkbox(label, value=False)

This is used in our app to hide or show the raw data as a table. The code snippet looks like :

if st.sidebar.checkbox('Show raw data'):
    st.subheader('Raw data')
    med_data

here we check if the checkbox is correct then we display the data as shown below

3. Selectbox: streamlit.selectbox(label, options, index=0, format_func=<class 'str'>)

This is used to choose a value from a list or a series as a dropdown list. The code snippet looks like :

option = st.sidebar.selectbox(
    'Mean or Mean results to get ',
     ['mean', 'median'])

Here we give two options whether to get the results as a mean or median of the values. The selected value will be saved in the option variable and then it can be used to do further analysis.

More widgets can be viewed here.

Text / Dataframe Display

Title: streamlit.title(body)

This is used for displaying text in title format, the code snippet:

st.title('Machine Resources Utilization')

2. Subheader:streamlit.subheader(body)

This is used for displaying text in subheader format, the code snippet:

st.subheader('Median Resources Utilization per hour of the day in the whole week')

3. Markdown: streamlit.markdown(body, unsafe_allow_html=False)

This is used for displaying text in markdown format, the code snippet:

st.sidebar.markdown('Interact with the data here')

4. Dataframe Display: streamlit.dataframe(data=None, width=None, height=None)

This is used for displaying a dataframe as an interactive table, the code snippet:

st.dataframe(df) # or we can directly do df

More text formats can be viewed here.

Visualization

Line chart: streamlit.line_chart(data=None, width=0, height=0)

As part of this app, we are using an only line chart. The code snippet looks like this :

st.line_chart(med_data)

here med_data is the dataframe containing all the values.

More visualization APIs can be viewed here.

The final partial code :

import streamlit as st
import pandas as pd
import numpy as npst.title('Machine Resources Utilization')
st.sidebar.title('Machine Resources Utilization')
DATE_COLUMN = 'date/time'@st.cache
def load_data():
    data = pd.read_excel('m_1.xlsx')
    data.drop(['machine_id', 'mem_gps', 'mkpi'], axis=1, inplace=True)
    lowercase = lambda x: str(x).lower()
    data.rename(lowercase, axis='columns', inplace=True)
    data[DATE_COLUMN] = pd.to_datetime(data['timestamp'])
    return data
data = load_data()
hour_data_group = data.groupby(data[DATE_COLUMN].dt.hour)st.sidebar.markdown('Interact with the data here')
option = st.sidebar.selectbox(
    'Mean or Mean results to get ',
     ['mean', 'median'])'You have selected: ', option# median stats
if option == 'median':
    med_data = pd.DataFrame(data.groupby(data[DATE_COLUMN].dt.hour).median(),
                            columns=['cpu_util_percent', 'mem_util_percent', 'net_in', 'net_out', 'disk_io_percent'])
    med_data.reset_index(inplace=True)
    med_data.drop([DATE_COLUMN], axis=1, inplace=True)
    st.subheader('Median Resources Utilization per hour of the day in the whole week')
    if st.sidebar.checkbox('Show raw data'):
        st.subheader('Raw data')
        med_data
    st.line_chart(med_data)
    st.subheader('Median Resources Utilization per minute for the selected hour of the day in the whole week')
    # Some number in the range 0-23
    hour_to_filter = st.sidebar.slider('hour', 0, 23, 17)
    hour_data = hour_data_group.get_group(hour_to_filter)
    hour_data
    hour_median_data = pd.DataFrame(hour_data.groupby(hour_data[DATE_COLUMN].dt.minute).median(),
                                    columns=['cpu_util_percent', 'mem_util_percent',
                                             'net_in', 'net_out', 'disk_io_percent'])
    hour_median_data.reset_index(inplace=True)
    hour_median_data.drop([DATE_COLUMN], axis=1, inplace=True)
    st.line_chart(hour_median_data)

The final web-app :

Creation and deployment of Docker container

Dockerfile creation:

We create first a dockerfile with the base image of miniconda3. After that, we create a directory for our code and then install the dependencies. Further, we copy our code to the directory and expose the streamlit port 8501.

The last command is to run the app using the streamlit command :

streamlit run web.py

Here is the final dockerfile

FROM continuumio/miniconda3:latestMAINTAINER Anshul Jindal "anshul.jindal@tum.de"RUN mkdir -p /home/streamlit
WORKDIR /home/streamlitRUN conda install --yes \
    numpy==1.16.3 \
    pandas==0.24.2 \
    && conda clean -afyRUN pip install streamlit 
RUN pip install xlrd COPY . /home/streamlit EXPOSE 8501CMD [ "streamlit", "run", "web.py" ]

2. Docker Image build:

Now we can create the docker image using the command:

docker build -t <tagname> .

3. Docker Image push to docker repository:

Push the docker image to docker hub:

docker login
docker push <tagname> .

4. Pull image and run docker container:

Pull and run the web app on a server:

docker run -p 8501:8501 <tagname>

Example:

docker run -p 8501:8501 ansjin/streamlit:latest

Go to the http://<SERVER_PUBLIC_IP>:8501 to access the app.

Important Links:

Complete Source code
Dataset link

Please post your thoughts and if something needs to be changed or added please do let me know. You can reach out to me on LinkedIn.

Check out my personal website for more details.

I look forward to your feedback!