How to build a data visualization page with streamlit using python
Streamlit allows you to build a web application and see the final result in a matter of minutes. It can be used to create interactive apps to display data using a combination of pandas and matplotlib.
In this tutorial, we will create a simple data visualization page from scratch using data from the french national assembly. You can find the code of my version of this visualization page as well as the preformatted data I used on my github.
Setting up streamlit and creating your first app
Nothing more simple than installing streamlit
$ pip install streamlit
$ streamlit hello
Create a new file in your app folder, name it app.py, and import the following libraries.
import streamlit as st
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import os
from matplotlib.backends.backend_agg import RendererAgg
Importing the data
Let’s now import the data. We need two datasets, one for the deputies and one for the political parties (click on the links to download the data). We will create two functions to load the data.
#Loading the data
@st.cache
def get_data_deputies():
return pd.read_csv(os.path.join(os.getcwd(),'df_dep.csv'))@st.cache
def get_data_political_parties():
return pd.read_csv(os.path.join(os.getcwd(),'df_polpar.csv'))
The st.cache allows us to only download the data once and save it in cache. This improves the performance of the app a lot. More info in the streamlit documentation.
Displaying the data
We can now import the data and display the two dataframes.
#configuration of the page
st.set_page_config(layout="wide")#load dataframes
df_dep = get_data_deputies()
df_pol_par = get_data_political_parties()st.title('French national assembly vizualisation tool')
st.markdown("""
This app performs simple visualization from the open data from the french national assembly!
""")st.write(df_dep)
st.write(df_pol_par)
st.write is the easiest way to display pandas dataframes. We changed the page config to display things in a large format.
Run your app
$ streamlit run app.py
You should get something resembling this.
Making the page interactive for the user
One way to make the page interactive is by using filters on the data to display only a subset of the dataframe. We can make use of the layout of the streamlit app by adding widgets to the sidebar of the page.
st.sidebar.header('Select what to display')
pol_parties = df_dep['pol party'].unique().tolist()
pol_party_selected = st.sidebar.multiselect('Political parties', pol_parties, pol_parties)nb_deputies = df_dep['pol party'].value_counts()
nb_mbrs = st.sidebar.slider("Number of members", int(nb_deputies.min()), int(nb_deputies.max()), (int(nb_deputies.min()), int(nb_deputies.max())), 1)
The two widgets we used are multiselect which allows choosing a subset out of a group of options and slider which is used to select a range. Click on the links to check the functions API.
Using the output of the widgets to filter the dataframe
We have now two information to filter our data:
- pol_party_selected is a list of the political parties we wish to keep
- nb_mbrs is an array with the minimum and the maximum number of members per political party.
Let’s transform these values into masks for the dataframe
#creates masks from the sidebar selection widgets
mask_pol_par = df_dep['pol party'].isin(pol_party_selected)#get the parties with a number of members in the range of nb_mbrs
mask_mbrs = df_dep['pol party'].value_counts().between(nb_mbrs[0], nb_mbrs[1]).to_frame()
mask_mbrs= mask_mbrs[mask_mbrs['pol party'] == 1].index.to_list()
mask_mbrs= df_dep['pol party'].isin(mask_mbrs)
Applying the masks
df_dep_filtered = df_dep[mask_pol_par & mask_mbrs]
st.write(df_dep_filtered)
Don’t forget to remove or comment the two previous st.write to only display the filtered dataframe.
Plot the data
Displaying a dataframe is good but being able to show plots of the data is even better. To do this we will use matplotlib and seaborn, two very useful python libraries for data visualization.
Preparation to display plot in streamlit
Plots can be slow to render on streamlit, especially if you are trying to display a lot of data. One solution is to use a backend renderer from matplotlib when displaying graphs. This can save some seconds and make the experience fluid for the user.
matplotlib.use("agg")
_lock = RendererAgg.lock
Pie plot
We would like to plot a pie chart of the political parties and their number of members. First, we will count the number of members per political party and get the color associated with the party to always display a political party with its color. If we don’t do this step, when we will use the filters, the color of the political parties might change in the pie chart and confuse the user.
pol_par = df_dep_filtered['pol party'].value_counts()
#merge the two dataframe to get a column with the color
df = pd.merge(pd.DataFrame(pol_par), df_pol_par, left_index=True, right_on='abreviated_name')
colors = df['color'].tolist()
We now have everything to plot the data
row0_spacer1, row0_1, row0_spacer2, row0_2, row0_spacer3 = st.beta_columns((0.2, 1, .2, 1, .2))with row0_1, _lock:
st.header("Political parties")
fig, ax = plt.subplots(figsize=(5, 5))
ax.pie(pol_par, labels=(pol_par.index + ' (' + pol_par.map(str)
+ ')'), wedgeprops = { 'linewidth' : 7, 'edgecolor' : 'white'
}, colors=colors)
#display a white circle in the middle of the pie chart
p = plt.gcf()
p.gca().add_artist(plt.Circle( (0,0), 0.7, color='white'))
st.pyplot(fig)
Let’s explain a little bit the previous block of code
- st.beta_columns() is used to create columns in streamlit, this way we can display things side by side
- with row0_1, _lock: we display things in the first column and improve the performance of plotting with matplotlib
- wedgeprops is a way to modify the style of the plot
The graph can be hard to read due to the abbreviation of the names. Let’s add a section on the side to explain the abbreviations.
with row0_2:
df = df.reset_index(drop=True)
t = ''
for i in range(len(df)):
t=t+df.loc[i,'abreviated_name']+' : '+df.loc[i,'name']+' \n'
for i in range(5):
st.write("")
st.write(t)
We should obtain something like this:
Bar plot
Let’s display the proportion of women in the political parties. First, we need to calculate the women's ratio in every party.
df = df_dep[mask_pol_par & mask_mbrs]
df_sex = pd.concat([df, pd.get_dummies((df)['sex'], prefix='sex')],axis=1)#we group by political parties and sum the male and female
df_sex = df_sex.groupby(['pol party']).agg({'sex_female':'sum','sex_male':'sum'})#calculate the proportion of women per parties
df_sex['pol party'] = df_sex.index
df_sex['total'] = df_sex['sex_female'].astype(int) + df_sex['sex_male']
df_sex['ratio_f'] = df_sex['sex_female']/df_sex['total']
Like the previous plot, we need to get the color associated with each party
df_sex = pd.merge(df_sex, df_pol_par, left_index=True, right_on='abreviated_name')
df_sex = df_sex.sort_values(by=['ratio_f'], ascending=False)
colors = df_sex['color'].tolist()
We can now plot the graph
row2_spacer1, row2_1, row2_spacer2, row2_2, row2_spacer3 = st.beta_columns((0.2, 1, .2, 1, .2))with row2_1, _lock:
st.header('Women deputies')
fig, ax = plt.subplots(figsize=(5, 5))
sns.barplot(x="ratio_f", y="pol party", data=df_sex,
ax=ax, palette=colors)
ax.set_ylabel('Political party')
ax.set_xlabel('Percentage of women deputies') i = 0
text = (df_sex['ratio_f'].round(2)*100).astype(int).to_list() for rect in ax.patches:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2., rect.get_y()
+ height * 3 / 4., str(text[i])+'%', ha='center',
va='bottom', rotation=0, color='white', fontsize=12)
i = i + 1 st.pyplot(fig)
Let’s explain a little bit the previous block of code
- We created a new column object to display things under the previous plot
- for rect in ax.patches: is used to display text inside each rectangle in the bar plot. I won’t explain everything in detail. To say things simply I iterate over every rectangle from the barplot and print text according to the coordinate.
Conclusion
We just built an interactive app to display and apply some filters on a dataframe, as well as displaying plots using seaborn and matplotlib. We learned how to use st.write, use columns and create widgets to add interaction with the user.
Here is a gif showing the filters modifying the dataframe.
And here is a gif that shows the final app with the plots.
Next steps
There is much more to learn on streamlit. Don’t hesitate to check out the streamlit webpage for the API and their selection of projects made by fellow developers.
You could also continue developing this app with more plots and data analysis like a plot of the age repartition of the deputies, a plot displaying the previous activities of the deputies, etc.
Links
- The exact code of this tutorial and the csv files can be found here
- My more advanced project can be accessed at https://open-data-national-assembly.herokuapp.com/
- The full code of this advanced project is available on github.