Build Web-Application for Exploratory Data Analysis and Predicting the Cost of Marketing Campaign using Streamlit.

Published in

Data And Beyond

7 min readMay 2, 2023

Effective exploratory data analysis as well as creating the model for doing the prediction is an exciting task. But then converting an analytical project into a web-application that can effectively been deployed on any platform provides an advantage while starting the career in field of data science.

Contents that would be discussed in the article are: -

· Why Streamlit? It’s Introduction

· Creating the widgets like sidebar, selectbox, submit-buttons,text-input and working with media files like images

· Working with streamlit containers to store multiple elements using expander.

· Displaying data visualizations using seaborn and matplotlib with streamlit.

· Creating three different applets one for EDA, one for making prediction. Last the main application in which both EDA and prediction applet were combined.

The project and its nitty gritty about the analysis and modeling part is discussed in an earlier article.

In this we’ll be focusing only on creating the web-application using streamlit and to follow along you can find the full code here.

Introduction

One of the major advantages of Streamlit tool is that it helps in creating efficient web based applications in short duration of time. Besides this, with the tool one can create or develop highly interactive web-based applications around the data and is used for Data Visualizations and Machine Learning applications.

Set up

The process of setting up Steamlit tool in python is quite easy.

! pip install streamlit
## this will install streamlit library, just like any other python library

and to run the streamlit application below command needs to be executed in the python ide terminal.

streamlit run prediction_eda_app.py
## this will run the application on the browser

Image1. code to run streamlit application. Image2 Output of the application we built

Import other necessary libraries that are required for making our web-application. For each applet different libraries are required.

## for the main application we named as prediction_eda_app.py
import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import streamlit.components.v1 as stc
from PIL import Image
from eda_app import run_eda_app
from ml_app import run_ml_app

## for eda_app.py this applet is created for Explortaory Data Analysis and Data
## Visualization
import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings(action = 'ignore')

## for ml_app.py this applet is created for making predictions of cost below libraries
## are required

import streamlit as st
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings(action = 'ignore')

import pickle
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler,OneHotEncoder

2. Creating Widgets like side-bar, drop-downs, menu, loading image etc.

## to load the image PIL library has been loaded
## provide the path to open the image
## this is the main page of our web_application.
img = Image.open('C:/Users/Chesta/Pictures/kaggle.png')

## on the main page we'll have the title , Image and side-bar menu that
## will effectively help us to choose from.
## in the main page I have written the basic information about the web-application 
## and the areas we try to cover.
def about():

    ## we can use write() function of the Streamlit to put the text on web-application
    st.write('Our motto of this Streamlit application is to create a web app showing the basic EDA') 
    st.write('of the data set and the methodologies which have been applied for creating the model and helps in predicting the media-campaign cost.')

    st.write("1) We'll do EDA and provide our conclusions based on the analysis")
    st.write("2) Will cover some concepts based which we'll observe when working across the datasets like Multicollinearity, Importance of Scaling,Use of Cross Validation, Randomized Search.")
    st.write("3) We'll look for basic idea behind using ensemble learning technique. Specifically Random Forest which is using Bagging or Bootstrap Aggregation Advance Ensemble Technique.")


def main():
    ## this will display the image on our web-app
    st.image(img)
    ## to put the title on the application
    st.title("Media-Campaign-Cost-Dataset Kaggle")
    ## will create a list of menu to be displayed on the sidebar
    menu = ['Home','EDA','ML','About']
  ## creating the sidebar selection box to choose from drop down list
    choice = st.sidebar.selectbox("Menu",menu)
    
    if choice=="Home":
        st.subheader('Home')
    elif choice=='EDA':
        run_eda_app()
    elif choice=='ML':
        run_ml_app()
    else:
        about()

if __name__ == '__main__':
    main()

Thus, from our main page we can do the selection of the menu.

Image3 displaying the Main Menu of the web-application and the About page of it.

3. Focusing on the third aspect of creating expanders that helps to effectively showcase the results in an organized way and the widgets helps in providing the flexibility to choose from sub-menu of EDA whether we want to display Data Description or the plots for showcasing our analysis.

## code which is been required for expanders that helps to displaying of results
## this is a portion of the code, rest you'll able to find in the notebook

## here we are first creating the list of sub-menu and based on the selection
## that particular menu will opens up

    submenu = st.sidebar.selectbox('Submenu',['Data Description','Plots'])
    if submenu=="Data Description":
        st.subheader("Data Description")
        st.dataframe(df) ## this will display the dataset that we are working with
        st.write(f"Shape of DataFrame is:- {df.shape}")
        with st.expander("Data Types"):## this will create the expander to showcase the datatypes of all the columns
            st.dataframe(df.dtypes)
## the result of this EDA applet will be displayed 
## expander is used to hold multipel elements and it can be collapsed and expanded
## by the user.

Image4 Displaying the sub-menu of EDA applet and the exapnder

4. Graphical representation of the results is an effective way to communicate our analytical results and displaying it using streamlit is quite feasible using one liner code st.pyplot(plt).

## code for ploting the graph using seaborn or matplotlib will remain same
## and to display it on seaborn would require just one liner code or function


elif submenu=='Plots':
        st.subheader('Plots')
        with st.expander("Boxplot & Countplot"):
            st.subheader("Boxplot and average costing of ordinal variables")
            grouped_data = df.groupby(['unit_sales(in millions)'])['cost'].agg('mean').reset_index()

            fig, axs = plt.subplots(nrows =4,ncols=2, figsize=(10, 7))
            garph = sns.boxplot(x='unit_sales(in millions)',y='cost',hue='unit_sales(in millions)',data=df,dodge=False,palette = 'winter',ax=axs[0][0])
            garph.get_legend().remove()
            

            garph2 = sns.barplot(x='unit_sales(in millions)',y='cost',data=grouped_data,hue='unit_sales(in millions)',dodge=False,ax=axs[0][1],palette = 'winter')
            # plt.legend(bbox_to_anchor=(1.02, 1), loc='upper right', borderaxespad=0)
            garph2.get_legend().remove()
            plt.tight_layout()
           


            garph3 = sns.boxplot(x='total_children',y='cost',hue='total_children',data=df,dodge=False,palette = 'winter',ax=axs[1][0])
            garph3.get_legend().remove()
           

            grouped_data_cost = df.groupby(['total_children'])['cost'].agg('mean').reset_index()
            garph4 = sns.barplot(x='total_children',y='cost',data=grouped_data_cost,hue='total_children',dodge=False,ax=axs[1][1],palette = 'winter')
            garph4.get_legend().remove()
            plt.tight_layout()
            


            garph3 = sns.boxplot(x='num_children_at_home',y='cost',hue='num_children_at_home',data=df,dodge=False,palette = 'winter',ax=axs[2][0])
            garph3.get_legend().remove()
           

            grouped_data_cost = df.groupby(['num_children_at_home'])['cost'].agg('mean').reset_index()
            garph4 = sns.barplot(x='num_children_at_home',y='cost',data=grouped_data_cost,hue='num_children_at_home',dodge=False,ax=axs[2][1],palette = 'winter')
            garph4.get_legend().remove()
            plt.tight_layout()
            


            garph3 = sns.boxplot(x='avg_cars_at home(approx).1',y='cost',hue='avg_cars_at home(approx).1',data=df,dodge=False,palette = 'winter',ax=axs[3][0])
            garph3.get_legend().remove()
           

            grouped_data_cost = df.groupby(['avg_cars_at home(approx).1'])['cost'].agg('mean').reset_index()
            garph4 = sns.barplot(x='avg_cars_at home(approx).1',y='cost',data=grouped_data_cost,hue='avg_cars_at home(approx).1',dodge=False,ax=axs[3][1],palette = 'winter')
            garph4.get_legend().remove()
            plt.tight_layout()
            st.pyplot(plt) ## this is the one line code to display the expander and all the plots it contains

Image5 Data Visualization using streamlit and seaborn.

5. Last, in the article where I have build the model using Random Forest Regressor as well as created the pipleline for data pre-processing will convert those pipelines and models into .pkl files. Those .pkl files will be using in this web-application for making the predictions of cost for the marketing campaign.

And before making the predictions it is important to pre-process or transform thus making it necessary to save .pkl files for data-preprocessor.

## to convert the data preprocessing pipelines to .pkl files following code is required
## for preprocessing the data using ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', MinMaxScaler(), numeric_data),
        ('onehot', OneHotEncoder(), cat_data)
    ])


import pickle
with open('column_transformer_pipeline.pkl', 'wb') as f:
    pickle.dump(preprocessor, f)

## save the model in .pkl format that can be used in making the predictions

import pickle
with open('model1.pkl', 'wb') as f:
    pickle.dump(model, f)

## these .pkl files will be used for making the predictions
## to get the inputs from the users we'll create an interface with the help of streamlit 
## below is the code for creating the organized user interface and rest of the code you can find in notebook
## and one submit button will be created that will display the results.

col1,col2 = st.columns(2)
with col1:
        store_sales = float(st.number_input("store_sales"))## getting the input from the user and organizing them in two columns
        gross_weight = float(st.number_input("gross_weight"))
        units_per_case = float(st.number_input("units_per_case"))
        total_children = int(st.number_input("total_children",0,5))
        num_children_at_home = int(st.number_input("num_children_at_home",0,5))
        store_sqft = float(st.number_input("store_sqft"))
        unit_sales = int(st.number_input('unit_sales',1,6))
    with col2:
        avg_cars_at_home = int(st.number_input("average_cars",0,4))
        recyclable_package = int(st.number_input("recycable_pkg",0,1))
        low_fat = int(st.number_input("low_fat",0,1))
        coffee_bar =int(st.number_input('coffee_bar',0,1))
        video_store= st.number_input('video_store',0,1)
        salad_bar = st.number_input("salad_bar",0,1)
        florist=st.number_input("florist",0,1) 

    if st.button("Analysis Result"):
        analysis = prediction(store_sales,gross_weight, units_per_case, total_children, num_children_at_home, store_sqft,unit_sales,avg_cars_at_home,recyclable_package,low_fat,coffee_bar,video_store,salad_bar,florist)
        # st.success(analysis)   
    else:
        st.write("Click the above button for results")

## displaying the output image for getting the results of our ML predictiobn application.

To conclude the article

We have created a web-application project for displaying the results of EDA as well as making a user-interface for making cost predictions of marketing campaign.
Get familiar with streamlit and its components, how effectively we can build a web-app in short duration of time and with limited knowledge of html, css or javascript.

I hope you enjoyed reading this article and would have gained knowledge as well in terms of how to convert you ML projects into a web application. Please do follow me for more such data science related articles.

Build Web-Application for Exploratory Data Analysis and Predicting the Cost of Marketing Campaign using Streamlit.

Written by Chesta Dhingra