Building an Interactive Exploratory Data Analysis App with Python and Streamlit: A step-by-step tutorial with Data Profiling API

Lamis Ghoualmi, Ph.D.
3 min readMar 20, 2023

Exploratory data analysis (EDA) is an essential step in every data science project or problem, as it provides crucial insights about the dataset. Without a thorough understanding of the dataset, it becomes challenging to select an appropriate model for the task at hand.

As someone who has worked with numerous datasets, I’ve often found myself writing similar EDA codes repeatedly. To streamline this process and make it more efficient, I decided to automate it using a python data profiling API.

Steps to building an exploratory data analysis app using Streamlit and the Data-Profiling APIs:

  1. Install the required packages: You’ll need to have Python installed, as well as the Streamlit and Pandas-Profiling packages. You can install these using pip by running the following commands in your command prompt or terminal:
pip install streamlit
pip install pandas-profiling
pip install streamlit-pandas-profiling

2. Import the required packages: You’ll need to import Pandas, Pandas-Profiling, Streamlit, and the st_profile_report function from streamlit-pandas-profiling.

import pandas as pd
import pandas_profiling
import streamlit as st
from streamlit_pandas_profiling import st_profile_report

3. Create a Streamlit app: You can create a new Streamlit app using the following code. Add instructions for your app: You can use the st.write() function to add instructions for your app. For example:

#This creates a title for your app.
st.title("Exploratory Data Analysis App")

st.write("This app will help you perform exploratory data analysis on your dataset.")

4. Allow users to upload their own dataset or use a preloaded dataset: You can use the st.sidebar.file_uploader() function to allow users to upload their own dataset, and the st.sidebar.selectbox() function to allow them to choose from preloaded datasets. For example:

option = st.sidebar.selectbox("Choose a dataset:", ("Load a dataset", "Use example dataset"))

if option == "Load a dataset":
uploaded_file = st.sidebar.file_uploader("Upload a dataset (CSV file)", type=["csv"])
if uploaded_file is not None:
df = pd.read_csv(uploaded_file)
# perform data profiling using Pandas-Profiling
pr = df.profile_report()
st.header("**Dataset:**")
st.write(df)
if st.sidebar.button("Generate Report"):
st.write("---")
st.header("**Pandas Profiling Report**")
st_profile_report(pr)

if option == "Use example dataset":
dataset = st.sidebar.selectbox("Try a preloaded dataset:", ("Diabetes dataset", "Chronic Kidney Disease Dataset"))
if dataset == "Diabetes dataset":
df = pd.read_csv("diabetes.csv")
# perform data profiling using Pandas-Profiling
pr = df.profile_report()
st.header("**Dataset:**")
st.write(df)
if st.sidebar.button("Generate Report"):
st.write("---")
st.header("**Pandas Profiling Report**")
st_profile_report(pr)

if dataset == "Chronic Kidney Disease Dataset":
df = pd.read_csv("kidney_disease.csv")
# perform data profiling using Pandas-Profiling
pr = df.profile_report()
st.header("**Dataset:**")
st.write(df)
if st.sidebar.button("Generate Report"):
st.write("---")
st.header("**Pandas Profiling Report**")
st_profile_report(pr)

5. Display the data and the Pandas-Profiling report: Use the st.write() function to display the data and the st_profile_report() function to display the Pandas-Profiling report. For example:

st.header("**Dataset:**")
st.write(df)

st.header("**Pandas Profiling Report**")
st_profile_report(pr)

Steps for checking a Streamlit app on localhost and deploying it on the web:

  1. Check Streamlit app on localhost:
  • After writing the Streamlit app code, save it with a .py extension.
  • Open the terminal/command prompt and navigate to the directory where the .py the file is saved.
  • Run the command streamlit run filename.py in the terminal/command prompt.
  • If everything is set up correctly, the Streamlit app will open up in a browser at localhost:8501.

2. Deploy the Streamlit app on the web:

There are several ways to deploy a Streamlit app on the web, including using platforms like GitHub, Streamlit Sharing, etc. I personally like Github. Deploying a Streamlit app on GitHub Pages involves the following steps:

  1. Create a new repository:
  • Go to your GitHub profile and create a new repository by clicking on the “+” button in the top-right corner of the page and selecting “New repository”.
  • Add your streamlit app script to the GitHub repository.
  • Create a new file called requirements.txt in the root directory of your project and add all the necessary dependencies for your Streamlit app to run.
pandas
streamlit
pandas-profiling
streamlit-pandas-profiling

If you’re interested in seeing what your own exploratory data analyst app looks like, you can check it out using the link ProfilingApp. I hope you enjoyed this tutorial, and I would love to see the apps you create. If you have any questions, please feel free to leave a comment below. Happy learning, and stay tuned for upcoming tutorials. Let’s connect on LinkedIn!

--

--

Lamis Ghoualmi, Ph.D.

I am deeply passionate about data and thoroughly enjoy sharing my expertise in data analysis and data science through tutorials.