How to build a data visualization page with streamlit using python

Max Lutz
6 min readAug 20, 2021

--

Streamlit allows you to build a web application and see the final result in a matter of minutes. It can be used to create interactive apps to display data using a combination of pandas and matplotlib.

In this tutorial, we will create a simple data visualization page from scratch using data from the french national assembly. You can find the code of my version of this visualization page as well as the preformatted data I used on my github.

Setting up streamlit and creating your first app

Nothing more simple than installing streamlit

$ pip install streamlit
$ streamlit hello

Create a new file in your app folder, name it app.py, and import the following libraries.

import streamlit as st 
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import os
from matplotlib.backends.backend_agg import RendererAgg

Importing the data

Let’s now import the data. We need two datasets, one for the deputies and one for the political parties (click on the links to download the data). We will create two functions to load the data.

#Loading the data
@st.cache
def get_data_deputies():
return pd.read_csv(os.path.join(os.getcwd(),'df_dep.csv'))
@st.cache
def get_data_political_parties():
return pd.read_csv(os.path.join(os.getcwd(),'df_polpar.csv'))

The st.cache allows us to only download the data once and save it in cache. This improves the performance of the app a lot. More info in the streamlit documentation.

Displaying the data

We can now import the data and display the two dataframes.

#configuration of the page
st.set_page_config(layout="wide")
#load dataframes
df_dep = get_data_deputies()
df_pol_par = get_data_political_parties()
st.title('French national assembly vizualisation tool')
st.markdown("""
This app performs simple visualization from the open data from the french national assembly!
""")
st.write(df_dep)
st.write(df_pol_par)

st.write is the easiest way to display pandas dataframes. We changed the page config to display things in a large format.

Run your app

$ streamlit run app.py

You should get something resembling this.

The first step of the data visualization app

Making the page interactive for the user

One way to make the page interactive is by using filters on the data to display only a subset of the dataframe. We can make use of the layout of the streamlit app by adding widgets to the sidebar of the page.

st.sidebar.header('Select what to display')
pol_parties = df_dep['pol party'].unique().tolist()
pol_party_selected = st.sidebar.multiselect('Political parties', pol_parties, pol_parties)
nb_deputies = df_dep['pol party'].value_counts()
nb_mbrs = st.sidebar.slider("Number of members", int(nb_deputies.min()), int(nb_deputies.max()), (int(nb_deputies.min()), int(nb_deputies.max())), 1)

The two widgets we used are multiselect which allows choosing a subset out of a group of options and slider which is used to select a range. Click on the links to check the functions API.

Using the output of the widgets to filter the dataframe

We have now two information to filter our data:

  • pol_party_selected is a list of the political parties we wish to keep
  • nb_mbrs is an array with the minimum and the maximum number of members per political party.

Let’s transform these values into masks for the dataframe

#creates masks from the sidebar selection widgets
mask_pol_par = df_dep['pol party'].isin(pol_party_selected)
#get the parties with a number of members in the range of nb_mbrs
mask_mbrs = df_dep['pol party'].value_counts().between(nb_mbrs[0], nb_mbrs[1]).to_frame()
mask_mbrs= mask_mbrs[mask_mbrs['pol party'] == 1].index.to_list()
mask_mbrs= df_dep['pol party'].isin(mask_mbrs)

Applying the masks

df_dep_filtered = df_dep[mask_pol_par & mask_mbrs]
st.write(df_dep_filtered)

Don’t forget to remove or comment the two previous st.write to only display the filtered dataframe.

We are now displaying only the deputies that belong to parties with less than 50 members

Plot the data

Displaying a dataframe is good but being able to show plots of the data is even better. To do this we will use matplotlib and seaborn, two very useful python libraries for data visualization.

Preparation to display plot in streamlit

Plots can be slow to render on streamlit, especially if you are trying to display a lot of data. One solution is to use a backend renderer from matplotlib when displaying graphs. This can save some seconds and make the experience fluid for the user.

matplotlib.use("agg")
_lock = RendererAgg.lock

Pie plot

We would like to plot a pie chart of the political parties and their number of members. First, we will count the number of members per political party and get the color associated with the party to always display a political party with its color. If we don’t do this step, when we will use the filters, the color of the political parties might change in the pie chart and confuse the user.

pol_par = df_dep_filtered['pol party'].value_counts()
#merge the two dataframe to get a column with the color
df = pd.merge(pd.DataFrame(pol_par), df_pol_par, left_index=True, right_on='abreviated_name')
colors = df['color'].tolist()

We now have everything to plot the data

row0_spacer1, row0_1, row0_spacer2, row0_2, row0_spacer3 = st.beta_columns((0.2, 1, .2, 1, .2))with row0_1, _lock:
st.header("Political parties")
fig, ax = plt.subplots(figsize=(5, 5))
ax.pie(pol_par, labels=(pol_par.index + ' (' + pol_par.map(str)
+ ')'), wedgeprops = { 'linewidth' : 7, 'edgecolor' : 'white'
}, colors=colors)
#display a white circle in the middle of the pie chart
p = plt.gcf()
p.gca().add_artist(plt.Circle( (0,0), 0.7, color='white'))
st.pyplot(fig)

Let’s explain a little bit the previous block of code

  • st.beta_columns() is used to create columns in streamlit, this way we can display things side by side
  • with row0_1, _lock: we display things in the first column and improve the performance of plotting with matplotlib
  • wedgeprops is a way to modify the style of the plot

The graph can be hard to read due to the abbreviation of the names. Let’s add a section on the side to explain the abbreviations.

with row0_2:
df = df.reset_index(drop=True)
t = ''
for i in range(len(df)):
t=t+df.loc[i,'abreviated_name']+' : '+df.loc[i,'name']+' \n'
for i in range(5):
st.write("")
st.write(t)

We should obtain something like this:

Pie chart with the political parties and their number of members

Bar plot

Let’s display the proportion of women in the political parties. First, we need to calculate the women's ratio in every party.

df = df_dep[mask_pol_par & mask_mbrs]
df_sex = pd.concat([df, pd.get_dummies((df)['sex'], prefix='sex')],axis=1)
#we group by political parties and sum the male and female
df_sex = df_sex.groupby(['pol party']).agg({'sex_female':'sum','sex_male':'sum'})
#calculate the proportion of women per parties
df_sex['pol party'] = df_sex.index
df_sex['total'] = df_sex['sex_female'].astype(int) + df_sex['sex_male']
df_sex['ratio_f'] = df_sex['sex_female']/df_sex['total']

Like the previous plot, we need to get the color associated with each party

df_sex = pd.merge(df_sex, df_pol_par, left_index=True, right_on='abreviated_name')
df_sex = df_sex.sort_values(by=['ratio_f'], ascending=False)
colors = df_sex['color'].tolist()

We can now plot the graph

row2_spacer1, row2_1, row2_spacer2, row2_2, row2_spacer3 = st.beta_columns((0.2, 1, .2, 1, .2))with row2_1, _lock:
st.header('Women deputies')
fig, ax = plt.subplots(figsize=(5, 5))
sns.barplot(x="ratio_f", y="pol party", data=df_sex,
ax=ax, palette=colors)
ax.set_ylabel('Political party')
ax.set_xlabel('Percentage of women deputies')
i = 0
text = (df_sex['ratio_f'].round(2)*100).astype(int).to_list()
for rect in ax.patches:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2., rect.get_y()
+ height * 3 / 4., str(text[i])+'%', ha='center',
va='bottom', rotation=0, color='white', fontsize=12)
i = i + 1
st.pyplot(fig)

Let’s explain a little bit the previous block of code

  • We created a new column object to display things under the previous plot
  • for rect in ax.patches: is used to display text inside each rectangle in the bar plot. I won’t explain everything in detail. To say things simply I iterate over every rectangle from the barplot and print text according to the coordinate.

Conclusion

We just built an interactive app to display and apply some filters on a dataframe, as well as displaying plots using seaborn and matplotlib. We learned how to use st.write, use columns and create widgets to add interaction with the user.

Here is a gif showing the filters modifying the dataframe.

And here is a gif that shows the final app with the plots.

Next steps

There is much more to learn on streamlit. Don’t hesitate to check out the streamlit webpage for the API and their selection of projects made by fellow developers.

You could also continue developing this app with more plots and data analysis like a plot of the age repartition of the deputies, a plot displaying the previous activities of the deputies, etc.

Links

--

--

Max Lutz

Passionate about climate change and data visualization | Convinced that data science can help tackle climate change www.linkedin.com/in/max-lutz