Getting Started with Streamlit for Machine Learning Deployment

33 min read4 days ago

Vist the github for accompanying code

1. Introduction

1.1 What is Streamlit?

Streamlit is an open-source Python library that allows you to create interactive, web-based applications for data science and machine learning models with minimal effort. It has become a popular tool for machine learning practitioners due to its simplicity, flexibility, and powerful features. With just a few lines of code, you can transform your machine learning projects into interactive applications that allow users to input data, visualize results, and test models in real-time.

1.2 Why Use Streamlit for Model Deployment?

In the machine learning (ML) lifecycle, deployment plays a crucial role in transforming a well-trained model from code on your local machine into a tool that others can interact with and benefit from. While building models and experimenting with data is essential, the true value of a machine learning model is only realized when it is deployed and made accessible to end-users, stakeholders, or even the general public.

Deploying your model provides a way for non-technical users to leverage your work without needing to understand the underlying code. More importantly, it opens the door for real-time feedback, rapid iterations, and demonstration of your model’s capabilities in various scenarios. This is where Streamlit shines.

Here’s why Streamlit is an ideal tool for deploying machine learning models:

1. Simplifying Deployment

Streamlit eliminates the need for complex back-end infrastructure and web development, allowing you to build an interactive web application in just a few lines of Python code.

• Example: Deploy a sentiment analysis model that predicts the sentiment of user reviews with a simple, interactive interface.

2. Showcasing ML MVPs (Minimum Viable Products)

Streamlit helps you present working MVPs, where users can interact with the model and visualize results in real time, demonstrating the value of your solution effectively.

• Example: Deploy an app that predicts diabetes risk based on user input, making it easy for stakeholders to test and see results instantly.

3. Speeding Up Hackathons and Competitions

Streamlit’s quick deployment process allows you to focus on solving problems rather than building infrastructure, giving you an edge in time-sensitive environments like hackathons.

• Example: Build a healthcare prediction tool during a hackathon that allows judges to input patient data and view predictions immediately.

4. Enabling Rapid Feedback and Iteration

Streamlit makes it easy to update models or UI based on feedback, allowing for rapid iteration and improvements without complex redeployment steps.

• Example: Modify a customer churn prediction app based on feedback from sales teams by adding explanations and visualizations.

5. A Rich Ecosystem for Data Visualization

Streamlit’s built-in support for visualizations (e.g., Matplotlib, Plotly) allows users to easily interpret model outputs, fostering better understanding and trust.

• Example: Add an interactive scatter plot to a flower classification app, helping users visualize how the model separates different classes.

In this article, we’ll guide you through the process of setting up Streamlit and using it to deploy machine learning models. We’ll cover everything from setting up the environment to building simple classification and regression models, as well as deploying your application on platforms like Hugging Face. Whether you’re an experienced machine learning engineer or just getting started, this tutorial will help you deploy your models quickly and easily.

Introduction

2. Setting Up the Environment

3. Getting Started with Streamlit

4. Experiment 1: Classification using the Iris Dataset

5. Experiment 2: Simple Regression Model

6. Experiment 3: Sentiment Classification with Huggingface

7. Experiment 4: Image Classification

7. Best Practices for Streamlit Development

8. Deploying Streamlit Apps

9. Advanced Streamlit Features

10. Conclusion

2. Setting Up the Environment for Streamlit and Machine Learning

In this section, we’ll guide you through the basic setup to get your environment ready for Streamlit and machine learning development. We’ll cover installing Python, setting up virtual environments, and installing the necessary libraries.

2.1 Installing Python

First, ensure that you have Python installed. You can download it from the official Python website. The latest stable version (3.7 or higher) is recommended for compatibility with most libraries.

You can check if Python is installed by running the following command in your terminal:

python --version

If Python is already installed, it will display the installed version number.

2.2 Creating a Virtual Environment

It’s a good practice to create a virtual environment to manage dependencies for your project. There are two common ways to create a virtual environment: using venv or conda.

Using venv

To create a virtual environment using venv, follow these steps:

Open a terminal and navigate to your project folder.
Run the following commands:

# Create a virtual environment
python3.10 -m venv myenv

# Activate the virtual environment
# On Windows
myenv\Scripts\activate

# On macOS/Linux
source myenv/bin/activate

Once activated, you’ll see (myenv) in your terminal, indicating that the virtual environment is active.

2.3 Installing Necessary Libraries

Once your virtual environment is activated, you’ll need to install the necessary libraries for your project. Use pip to install the following packages:

# Install Streamlit
pip install streamlit

# Install machine learning and data processing libraries
pip install scikit-learn pandas numpy matplotlib seaborn streamlit-feedback tf_keras transformers torch 

# Install TensorFlow and Keras for image classification (optional)
pip install tensorflow

2.4 Verifying Installation

After installing the libraries, you can verify the installation by running:

# Check if Streamlit is installed
streamlit --version

# Import key libraries in Python to verify installation
python -c "import streamlit, sklearn, pandas, numpy, matplotlib, tensorflow; print('Libraries successfully installed')"

3. Getting Started with Streamlit

Streamlit is a powerful Python library that allows you to create web applications for your data science and machine learning projects with minimal effort. In this section, we’ll cover the basic concepts, create a simple app, and explore some widgets and layout options.

Streamlit follows a simple yet powerful paradigm:

Script-based: Your entire app is contained in a single Python script.
Reactive: The app automatically updates when you modify the source code.
Widgetized: Easy-to-use UI components for user interaction.
Data-centric: Seamless integration with popular data science libraries.

3.1 Creating Your First Streamlit App

Let’s create a simple “Hello, World!” app to get started:

Create a new Python file named hello_streamlit.py.
Add the following code:

import streamlit as st

st.title("Hello, Streamlit!")
st.write("Welcome to your first Streamlit app.")

name = st.text_input("What's your name?")
if name:
    st.write(f"Hello, {name}!")

st.write("This is a simple counter:")
count = st.button("Click me!")
if count:
    st.write("You clicked the button!")

Run the app using the command:

streamlit run hello_streamlit.py

This simple app demonstrates some key Streamlit features:

st.title() and st.write() for displaying text
st.text_input() for user input
st.button() for creating an interactive button

3.2 Streamlit Widgets and Layout Options

Streamlit offers a wide range of widgets for user interaction and layout customization. Here are some popular ones:

Input Widgets

import streamlit as st

# Text input
text = st.text_input("Enter some text")
st.write("You entered:", text)

# Number input
number = st.number_input("Enter a number", min_value=0, max_value=100, value=50)
st.write("Number entered:", number)

# Slider
slider_value = st.slider("Choose a value", 0, 100, 50)
st.write("Slider value:", slider_value)

# Checkbox
checkbox = st.checkbox("Check me!")
st.write("Checkbox checked:", checkbox)

# Selectbox
option = st.selectbox("Choose an option", ["Option 1", "Option 2", "Option 3"])
st.write("Selected option:", option)

# Radio buttons
radio = st.radio("Select one", ["A", "B", "C"])
st.write("Radio selection:", radio)

# Date input
date = st.date_input("Select a date")
st.write("Selected date:", date)

# File uploader
uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
    st.write("Uploaded file name:", uploaded_file.name)
else:
    st.write("No file uploaded yet")

Layout Options

Streamlit provides several ways to organize your app’s layout:

import streamlit as st

# Columns
col1, col2, col3 = st.columns(3)
with col1:
    st.write("This is column 1")
with col2:
    st.write("This is column 2")
with col3:
    st.write("This is column 3")

# Expander
with st.expander("Click to expand"):
    st.write("This content is hidden by default")

# Sidebar
st.sidebar.title("Sidebar Title")
st.sidebar.write("This content appears in the sidebar")

# Tabs
tab1, tab2 = st.tabs(["Tab 1", "Tab 2"])
with tab1:
    st.write("This is tab 1")
with tab2:
    st.write("This is tab 2")

These examples showcase some of Streamlit’s most commonly used widgets and layout options. As you build more complex apps, you’ll find that combining these elements allows for creating rich, interactive user interfaces with minimal effort.

4. Experiment 1: Classification using the Iris Dataset

In this experiment, we will build a simple machine learning classifier using the Iris dataset. We’ll train a Random Forest classifier and create a Streamlit app that allows users to input feature values, display classification results, and add some visualizations.

4.1 Complete code


import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# ==========================
# Load the Iris Dataset
# ==========================
# The Iris dataset contains measurements of iris flowers for three species: setosa, versicolor, and virginica.
# Each flower has four features: sepal length, sepal width, petal length, and petal width.
# Our task is to classify the species based on these features.

iris = load_iris()

# Creating a Pandas DataFrame from the dataset to easily view and manipulate the data.
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Add the target (species labels) to the DataFrame.
df['target'] = iris.target

# Mapping the target values (0, 1, 2) to species names.
# 0 = setosa, 1 = versicolor, 2 = virginica
df['species'] = df['target'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# ==========================
# Split Data into Train and Test Sets
# ==========================
# We split the dataset into training and testing sets. 
# This allows us to train the model on one part of the data (X_train, y_train) and test it on unseen data (X_test, y_test).
# 'test_size=0.2' means 20% of the data will be used for testing.

X = iris.data  # Features (sepal/petal lengths and widths)
y = iris.target  # Target (species labels)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ==========================
# Train a Random Forest Classifier
# ==========================
# Random Forest is a powerful ensemble learning method that creates multiple decision trees and merges them together to get more accurate predictions.
# n_estimators=100 means we are using 100 decision trees in the forest.

clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model to the training data
clf.fit(X_train, y_train)

# ==========================
# Streamlit App - User Interface
# ==========================
# Title of the app
st.title("Iris Flower Species Classifier")

# ==========================
# Sidebar - Input Sliders for User Input
# ==========================
# We create sliders in the sidebar that allow users to input values for the flower's features (sepal/petal length and width).
# These values will be used as input to the model for prediction.

st.sidebar.header('Input Features')

# Slider for Sepal Length
sepal_length = st.sidebar.slider('Sepal Length (cm)', 
                                 float(df['sepal length (cm)'].min()), 
                                 float(df['sepal length (cm)'].max()), 
                                 float(df['sepal length (cm)'].mean()))

# Slider for Sepal Width
sepal_width = st.sidebar.slider('Sepal Width (cm)', 
                                float(df['sepal width (cm)'].min()), 
                                float(df['sepal width (cm)'].max()), 
                                float(df['sepal width (cm)'].mean()))

# Slider for Petal Length
petal_length = st.sidebar.slider('Petal Length (cm)', 
                                 float(df['petal length (cm)'].min()), 
                                 float(df['petal length (cm)'].max()), 
                                 float(df['petal length (cm)'].mean()))

# Slider for Petal Width
petal_width = st.sidebar.slider('Petal Width (cm)', 
                                float(df['petal width (cm)'].min()), 
                                float(df['petal width (cm)'].max()), 
                                float(df['petal width (cm)'].mean()))

# ==========================
# Prediction
# ==========================
# Collect the input values from the sliders and prepare them for prediction.
# The model expects the input as a NumPy array of shape (1, 4) because we are making predictions for one flower at a time.

input_data = np.array([[sepal_length, sepal_width, petal_length, petal_width]])

# Use the trained Random Forest model to make a prediction.
# 'clf.predict()' returns the class label (0, 1, or 2) corresponding to setosa, versicolor, or virginica.
prediction = clf.predict(input_data)

# Predict the probabilities for each class using 'clf.predict_proba()'.
# This method returns the probability of the input belonging to each class (species).
prediction_proba = clf.predict_proba(input_data)

# ==========================
# Display Prediction Results
# ==========================
# Subheader to show the prediction results.
st.subheader('Prediction')

# We use 'iris.target_names' to map the predicted class number (0, 1, or 2) to the species name.
# The '[prediction][0]' gives us the predicted species name.
st.write(f"Predicted species: {iris.target_names[prediction][0]}")

# ==========================
# Display Prediction Probabilities
# ==========================
# We also display the predicted probability for each species.
# The higher the probability, the more confident the model is that the input flower belongs to that species.

st.subheader('Prediction Probability')
st.write(f"Setosa: {prediction_proba[0][0]:.2f}, Versicolor: {prediction_proba[0][1]:.2f}, Virginica: {prediction_proba[0][2]:.2f}")

# ==========================
# Scatter Plot - Sepal Length vs Sepal Width
# ==========================
# To visualize the data, we create a scatter plot of Sepal Length vs Sepal Width for all the flowers.
# The flowers are colored by species (Setosa, Versicolor, Virginica).

st.subheader('Scatter Plot of Sepal Length vs Sepal Width')

# Create the scatter plot using matplotlib.
fig, ax = plt.subplots()

# Create a dictionary that maps the target numbers (0, 1, 2) to the species names.
species_map = {0: 'Setosa', 1: 'Versicolor', 2: 'Virginica'}

# Define colors for each species.
colors = ['red', 'green', 'blue']

# Plot each species with its own color.
for i, species in species_map.items():
    subset = df[df['target'] == i]  # Get the subset of data for species i
    ax.scatter(subset['sepal length (cm)'], subset['sepal width (cm)'], 
               label=species, color=colors[i])

# Set the axis labels and show the legend.
ax.set_xlabel('Sepal Length (cm)')
ax.set_ylabel('Sepal Width (cm)')
ax.legend()

# Display the plot in the Streamlit app.
st.pyplot(fig)

While this Streamlit app demonstrates how to build an interactive machine learning application, integrating the model training and model prediction into the same script is not the best approach for production environments. Here’s why separating the training process from the Streamlit app is a better practice:

1. Training Overhead:

• Model training can be computationally expensive and time-consuming. Each time the Streamlit app is run, the model would need to be retrained, which can cause delays for the user.

• In real-world applications, you often train the model once and reuse it. Re-training on every app run leads to inefficient use of resources.

2. Consistency and Reproducibility:

• When training the model within the app, the results may vary due to random initialization of parameters, unless random seeds are controlled carefully.

• In a production environment, you want to ensure that the same trained model is used for consistent results.

3. Separation of Concerns:

• In software engineering, it’s a best practice to separate concerns. The training phase (model building, tuning, evaluation) should be handled in a different pipeline or script than the deployment (making predictions).

• This allows you to update or retrain the model independently of the Streamlit app. For example, if you improve your model, you can simply replace the saved model file without modifying the app logic.

4. Scalability:

• Separating the model training and the Streamlit app enables scalability. Once the model is trained and saved (e.g., as a .joblib or .pkl file), it can be loaded quickly in the Streamlit app and predictions can be made in real-time without delays.

• This also allows the app to handle larger traffic as the computational cost of making predictions is minimal compared to training.

5. Model Deployment:

• In production, you would typically deploy the trained model to a cloud service, microservice, or a database for prediction purposes. By decoupling the training process from the app, you make it easier to deploy the app and model separately, ensuring smoother integration and scalability.

Better Approach:

1. Train the Model Separately:

• Use a separate Python script or Jupyter notebook to train the model. After training, save the model to a file using libraries like joblib or pickle.

• This allows you to tune the model, conduct cross-validation, and ensure the model is properly trained before using it in an app.

2. Load the Pretrained Model in Streamlit:

• In the Streamlit app, load the saved model file (e.g., iris_random_forest_model.joblib). This ensures that the app focuses solely on making predictions, thus reducing the computational overhead and improving response time.

3. Update the Model Independently:

• If the model needs to be retrained (e.g., with new data or better algorithms), this can be done without modifying the Streamlit app. Once the new model is trained, you just replace the old model file.

4.2 Using joblib for Model Packaging

To separate the model training and prediction logic from the Streamlit application, you can use joblib to serialize and save your trained model.

This allows you to package the model and load it into the Streamlit app without having to re-train the model every time the app is run. Here’s how you can do it step by step:

First, we will train the model and save it to a file using joblib or pickle.

This should be done in a separate script to ensure that the model is trained once and can be loaded when needed.

import os
import joblib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# ======================
# Exploratory Data Analysis (EDA)
# ======================

# Load the Iris dataset and create a DataFrame
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Summary statistics
print("Dataset Overview:")
print(df.head(), "\n")
print("Summary Statistics:")
print(df.describe(), "\n")
print("Class Distribution:")
print(df['species'].value_counts(), "\n")

# Visualizing feature distributions
plt.figure(figsize=(12, 6))
for i, feature in enumerate(df.columns[:-1]):
    plt.subplot(2, 2, i + 1)
    sns.histplot(df[feature], kde=True, bins=15, color='skyblue')
    plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()

# ======================
# Splitting the Data and Model Training
# ======================

# Define features (X) and labels (y)
X = iris.data
y = iris.target


#Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42, stratify=iris.target)

# Save feature names to a .txt file
features = iris.feature_names
model_dir = "artifacts"
os.makedirs(model_dir, exist_ok=True)  # Create the directory if it doesn't exist
feature_file_path = os.path.join(model_dir, 'feature_names.txt')

# Write feature names to a text file
with open(feature_file_path, 'w') as f:
    for feature in features:
        f.write(f"{feature}\n")

print(f"Feature names saved to '{feature_file_path}'")

# Train the Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate the model
y_pred = clf.predict(X_test)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# Save the trained model to a file using joblib
model_path = os.path.join(model_dir, 'iris_random_forest_model.pkl')
joblib.dump(clf, model_path)

print(f"Model has been saved to '{model_path}'")

In the step 2, you build the Streamlit App.

In your Streamlit app, instead of training the model, you’ll load the pre-trained model from the pkl file. This separates the training logic from the app and speeds up the app because it doesn’t need to retrain the model each time.

import streamlit as st
import numpy as np
import joblib
import os

# Define the path to the artifacts (where the pre-trained model is stored)
artifact_path = 'artifacts'

# Load the pre-trained Random Forest model for Iris classification
# 'joblib.load()' is used to load the model from a file
model = joblib.load(os.path.join(artifact_path, 'iris_random_forest_model.joblib'))

# Title for the Streamlit app
st.title("Iris Flower Species Classifier")

# Sidebar to accept input from the user
st.sidebar.header('Input Features')

# Create input sliders for the four Iris features: Sepal Length, Sepal Width, Petal Length, and Petal Width.
# Users can slide to input the values for each feature. 
# For simplicity, we set default values as well as min and max values.

# Sepal Length slider
sepal_length = st.sidebar.slider('Sepal Length (cm)', min_value=0.0, max_value=10.0, value=5.0)

# Sepal Width slider
sepal_width = st.sidebar.slider('Sepal Width (cm)', min_value=0.0, max_value=10.0, value=3.0)

# Petal Length slider
petal_length = st.sidebar.slider('Petal Length (cm)', min_value=0.0, max_value=10.0, value=4.0)

# Petal Width slider
petal_width = st.sidebar.slider('Petal Width (cm)', min_value=0.0, max_value=10.0, value=1.3)

# Collecting all the feature values entered by the user into a list
feature_values = [sepal_length, sepal_width, petal_length, petal_width]

# Converting the list into a NumPy array for model input. 
# The array is reshaped because models expect input in 2D arrays (batch_size, number_of_features).
input_data = np.array([feature_values])

# Model prediction
# 'model.predict()' returns the predicted class (species) for the input data.
# This method outputs the class label as a number (0, 1, or 2), representing setosa, versicolor, or virginica respectively.
prediction = model.predict(input_data)

# Model prediction probabilities
# 'model.predict_proba()' returns the probability of the input data belonging to each of the three species.
# The result is an array of probabilities for each class (setosa, versicolor, virginica).
prediction_proba = model.predict_proba(input_data)

# Define the species names corresponding to the class labels 0, 1, and 2.
species_names = ['setosa', 'versicolor', 'virginica']

# Display the predicted species on the app using Streamlit's 'st.subheader()' and 'st.write()' functions.
st.subheader('Prediction')
st.write(f"Predicted species: {species_names[prediction[0]]}")

# ==========================
# Alternative Method Using Argmax
# ==========================
# The prediction above uses 'species_names[prediction[0]]' to map the model's numeric output to a species name.
# Alternatively, we can use 'argmax()' to find the class with the highest probability.
# Argmax simply finds the index of the highest value in the probability array, which corresponds to the predicted class.

# predicted_class = np.argmax(prediction_proba)
# st.write(f"Predicted species (using argmax): {species_names[predicted_class]}")
# ==========================
# In both cases, the result should be the same.

# Display prediction probabilities for all species.
# We show the probability for each species, and format the probabilities to two decimal places.
st.subheader('Prediction Probability')
st.write(f"Setosa: {prediction_proba[0][0]:.2f}, Versicolor: {prediction_proba[0][1]:.2f}, Virginica: {prediction_proba[0][2]:.2f}")

# ==========================
# Explanation of 'model.predict()' and 'model.predict_proba()'
# ==========================
# 'model.predict()' is used to make predictions by returning the most likely class (species) for the given input.
# The model predicts a class label (0 for setosa, 1 for versicolor, 2 for virginica) based on the input features.

# 'model.predict_proba()' returns the predicted probability of each class (species) instead of just the most likely one.
# It gives a probability score for how likely the input data belongs to each class.
# For example, for a given input, the model may output probabilities like:
# Setosa: 0.60, Versicolor: 0.25, Virginica: 0.15.
# In this case, the model is 60% confident that the input corresponds to Setosa.

# ==========================
# User Explanation of Argmax:
# ==========================
# 'np.argmax()' is a function that returns the index of the maximum value in a NumPy array.
# In our context, using 'argmax()' on the probabilities returned by 'model.predict_proba()' will give us the class 
# (0, 1, or 2) with the highest probability. This is a useful alternative to directly using 'model.predict()', 
# especially when you want to make a prediction based on probability rather than the most likely class.

Lets add some visualisation; this visualises the prediction classes.

import streamlit as st
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
import os

# ==============================
# Load the Trained Model
# ==============================
# The pre-trained Random Forest model is saved in a directory called 'artifacts'. 
# The model was trained earlier to classify iris species based on flower measurements (sepal and petal dimensions).
# 'joblib.load()' is used to load the saved model from the file.
artifact_path = 'artifacts'
model = joblib.load(os.path.join(artifact_path, 'iris_random_forest_model.joblib'))

# ==============================
# Streamlit App Layout
# ==============================

# Streamlit allows us to create a web interface for interacting with our model.
# 'st.title()' sets the title of the app that will be displayed at the top of the page.
st.title("Iris Flower Species Classifier")

# ==============================
# Sidebar Inputs for Flower Features
# ==============================
# We use the Streamlit sidebar to create input widgets where users can provide values for sepal length, 
# sepal width, petal length, and petal width (the features used for classification).

# Sidebar section for user inputs
st.sidebar.header('Input Features')

# Four sliders allow users to input values for the four flower features. 
# The default values are chosen to represent typical flower measurements, but users can adjust these as they wish.
# Each slider has a minimum value of 0 and a maximum value of 10, though real iris flowers tend to fall within a narrower range.
sepal_length = st.sidebar.slider('Sepal Length (cm)', min_value=0.0, max_value=10.0, value=5.0)
sepal_width = st.sidebar.slider('Sepal Width (cm)', min_value=0.0, max_value=10.0, value=3.0)
petal_length = st.sidebar.slider('Petal Length (cm)', min_value=0.0, max_value=10.0, value=4.0)
petal_width = st.sidebar.slider('Petal Width (cm)', min_value=0.0, max_value=10.0, value=1.3)

# Collect the input values into a list.
# This list represents the input features (sepal and petal measurements) for a single flower instance.
feature_values = [sepal_length, sepal_width, petal_length, petal_width]

# ==============================
# Prepare the Input for Prediction
# ==============================
# The machine learning model expects the input to be in the form of a 2D array (even if it’s just one flower),
# so we use np.array() to create a NumPy array with the input features.
input_data = np.array([feature_values])

# ==============================
# Make Predictions Using the Model
# ==============================
# The model predicts the species of the iris flower based on the input values.
# 'model.predict()' returns the predicted class (species), and 'model.predict_proba()' gives the probabilities
# of each species (setosa, versicolor, virginica).
prediction = model.predict(input_data)
prediction_proba = model.predict_proba(input_data)

# List of species names corresponding to the prediction classes
# (0 = setosa, 1 = versicolor, 2 = virginica).
species_names = ['setosa', 'versicolor', 'virginica']

# ==============================
# Display the Prediction Result
# ==============================
# 'st.subheader()' is used to create a section title for displaying the prediction result.
# We use 'st.write()' to show which species was predicted by the model.
st.subheader('Prediction')
st.write(f"Predicted species: {species_names[prediction[0]]}")

# ==============================
# Display Prediction Probabilities
# ==============================
# The model not only predicts a species but also gives probabilities for each species.
# These probabilities indicate the model’s confidence in its prediction for each species.
# We'll visualize this using a bar chart.

# Create a new section to show the prediction probabilities
st.subheader('Prediction Probabilities')

# Create a bar chart using Matplotlib to show the probability distribution for each species.
# The 'prediction_proba' array contains the probabilities of the flower belonging to each species.
fig, ax = plt.subplots()

# 'ax.bar()' creates a bar chart where each species name is on the x-axis and the corresponding probability on the y-axis.
ax.bar(species_names, prediction_proba[0])
ax.set_ylabel('Probability')
ax.set_title('Species Prediction Probabilities')

# Customize the chart to improve readability.
# Set the y-axis limit from 0 to 1 because probabilities range between 0 and 1.
plt.ylim(0, 1)

# Annotate each bar with its corresponding probability value.
# 'ax.text()' places text on the bars, displaying the actual probability value above each bar.
for i, v in enumerate(prediction_proba[0]):
    ax.text(i, v + 0.01, f'{v:.2f}', ha='center')

# 'st.pyplot()' renders the Matplotlib figure in the Streamlit app.
st.pyplot(fig)

# ==============================
# Display Numerical Probabilities
# ==============================
# For further clarity, we can also show the numerical values of the probabilities.
# 'st.write()' displays each species and its corresponding probability in a readable format.
st.write("Numerical probabilities:")
for species, prob in zip(species_names, prediction_proba[0]):
    st.write(f"{species}: {prob:.2f}")

Lets add some visualization;

import streamlit as st
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import os

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the path to the artifacts
artifact_path = 'artifacts'

# Load the trained model
model = joblib.load(os.path.join(artifact_path, 'iris_random_forest_model.joblib'))

# Streamlit app title
st.title("Iris Flower Species Classifier")

# Create input widgets manually
st.sidebar.header('Input Features')

# Sepal Length
sepal_length = st.sidebar.slider('Sepal Length (cm)', min_value=4.0, max_value=8.0, value=5.4)

# Sepal Width
sepal_width = st.sidebar.slider('Sepal Width (cm)', min_value=2.0, max_value=4.5, value=3.4)

# Petal Length
petal_length = st.sidebar.slider('Petal Length (cm)', min_value=1.0, max_value=7.0, value=4.7)

# Petal Width
petal_width = st.sidebar.slider('Petal Width (cm)', min_value=0.1, max_value=2.5, value=1.5)

# Collect feature values in a list
feature_values = [sepal_length, sepal_width, petal_length, petal_width]

# Add a predict button
predict_button = st.sidebar.button('Predict')

# Only make prediction when the button is pressed
if predict_button:
    # Prepare the input data for prediction
    input_data = np.array([feature_values])

    # Make predictions using the loaded model
    prediction = model.predict(input_data)
    prediction_proba = model.predict_proba(input_data)

    # Define species names
    species_names = ['setosa', 'versicolor', 'virginica']

    # Display the prediction result
    st.subheader('Prediction')
    st.write(f"Predicted species: {species_names[prediction[0]]}")

    # Display prediction probabilities
    st.subheader('Prediction Probabilities')

    # Create a bar chart of probabilities
    fig, ax = plt.subplots()
    ax.bar(species_names, prediction_proba[0])
    ax.set_ylabel('Probability')
    ax.set_title('Species Prediction Probabilities')

    # Customize the chart
    plt.ylim(0, 1)
    for i, v in enumerate(prediction_proba[0]):
        ax.text(i, v + 0.01, f'{v:.2f}', ha='center')

    # Display the chart in Streamlit
    st.pyplot(fig)

    # Display numerical probabilities
    st.write("Numerical probabilities:")
    for species, prob in zip(species_names, prediction_proba[0]):
        st.write(f"{species}: {prob:.2f}")

    # Visualize the predicted class along with existing data
    st.subheader('Visualization of Prediction')
    
    # Create a scatter plot of the existing data
    fig, ax = plt.subplots(figsize=(10, 6))
    scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)
    ax.set_xlabel('Sepal Length (cm)')
    ax.set_ylabel('Sepal Width (cm)')
    ax.set_title('Iris Dataset with Prediction')

    # Add a legend
    legend1 = ax.legend(*scatter.legend_elements(), title="Classes")
    ax.add_artist(legend1)

    # Plot the prediction point
    ax.scatter(sepal_length, sepal_width, color='black', s=200, marker='*', label='Prediction')
    ax.legend()

    # Display the plot
    st.pyplot(fig)

    # Add explanation of the visualization
    st.write("""
    This scatter plot shows the distribution of Iris flowers based on their sepal length and width. 
    The colors represent different species, and the black star shows where your input falls on this distribution.
    """)

else:
    st.write("Use the sliders in the sidebar to input flower measurements, then click 'Predict' to see the results.")

Lets add some more complexity like feedback

import streamlit as st
import numpy as np
import pandas as pd
import joblib
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import os
from streamlit_feedback import streamlit_feedback

# Function to handle feedback
def handle_feedback(thumbs_feedback, text_feedback):
    feedback_message = f"Thumbs feedback: {thumbs_feedback}" if thumbs_feedback else "No thumbs feedback provided"
    print(feedback_message)
    print(f"Comments: {text_feedback}")
    st.success("Thank you for your feedback!")
    st.session_state.feedback_submitted = True

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the path to the artifacts
artifact_path = 'artifacts'

# Load the trained model
model = joblib.load(os.path.join(artifact_path, 'iris_random_forest_model.joblib'))

# Streamlit app layout
st.set_page_config(layout="wide")

# Streamlit app title
st.title("Iris Flower Species Classifier")

# Initialize session state variables
if 'prediction_made' not in st.session_state:
    st.session_state.prediction_made = False
if 'feedback_submitted' not in st.session_state:
    st.session_state.feedback_submitted = False

# Create two columns
col1, col2 = st.columns([1, 2])

with col1:
    st.header('Input Features')

    # Sepal Length
    sepal_length = st.slider('Sepal Length (cm)', min_value=4.0, max_value=8.0, value=5.4)

    # Sepal Width
    sepal_width = st.slider('Sepal Width (cm)', min_value=2.0, max_value=4.5, value=3.4)

    # Petal Length
    petal_length = st.slider('Petal Length (cm)', min_value=1.0, max_value=7.0, value=4.7)

    # Petal Width
    petal_width = st.slider('Petal Width (cm)', min_value=0.1, max_value=2.5, value=1.5)

    # Collect feature values in a list
    feature_values = [sepal_length, sepal_width, petal_length, petal_width]

    # Add a predict button
    if st.button('Predict'):
        st.session_state.prediction_made = True
        st.session_state.feedback_submitted = False

with col2:
    # Only make prediction when the button is pressed and prediction hasn't been made yet
    if st.session_state.prediction_made:
        # Prepare the input data for prediction
        input_data = np.array([feature_values])

        # Make predictions using the loaded model
        prediction = model.predict(input_data)
        prediction_proba = model.predict_proba(input_data)

        # Define species names
        species_names = ['setosa', 'versicolor', 'virginica']

        # Display the prediction result
        st.subheader('Prediction')
        st.write(f"Predicted species: {species_names[prediction[0]]}")

        # Display prediction probabilities
        st.subheader('Prediction Probabilities')

        # Create a bar chart of probabilities
        fig, ax = plt.subplots()
        ax.bar(species_names, prediction_proba[0])
        ax.set_ylabel('Probability')
        ax.set_title('Species Prediction Probabilities')

        # Customize the chart
        plt.ylim(0, 1)
        for i, v in enumerate(prediction_proba[0]):
            ax.text(i, v + 0.01, f'{v:.2f}', ha='center')

        # Display the chart in Streamlit
        st.pyplot(fig)

        # Visualize the predicted class along with existing data
        st.subheader('Visualization of Prediction')
        
        # Create a scatter plot of the existing data
        fig, ax = plt.subplots(figsize=(10, 6))
        scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu)
        ax.set_xlabel('Sepal Length (cm)')
        ax.set_ylabel('Sepal Width (cm)')
        ax.set_title('Iris Dataset with Prediction')

        # Add a legend
        legend1 = ax.legend(*scatter.legend_elements(), title="Classes")
        ax.add_artist(legend1)

        # Plot the prediction point
        ax.scatter(sepal_length, sepal_width, color='black', s=200, marker='*', label='Prediction')
        ax.legend()

        # Display the plot
        st.pyplot(fig)

        # Add explanation of the visualization
        st.write("""
        This scatter plot shows the distribution of Iris flowers based on their sepal length and width. 
        The colors represent different species, and the black star shows where your input falls on this distribution.
        """)

        # Add feedback section if feedback hasn't been submitted yet
        if not st.session_state.feedback_submitted:
            st.subheader("Feedback")

            # Thumbs-up/thumbs-down feedback (optional)
            thumbs_feedback = streamlit_feedback(
                feedback_type="thumbs",
                key="feedback",
                align="flex-start"
            )

            # Text feedback section
            text_feedback = st.text_area("Additional comments or suggestions:")

            if st.button('Submit Feedback'):
                if thumbs_feedback or text_feedback:
                    handle_feedback(thumbs_feedback, text_feedback)
                else:
                    st.warning("Please provide either thumbs feedback or text feedback.")
        else:
            st.success("Thank you for your feedback!")

    else:
        st.write("Use the sliders on the left to input flower measurements, then click 'Predict' to see the results.")

4.3 Advantages of Using joblib for Model Packaging:

Decoupling Training from Deployment: By separating the training process from the app, you don’t need to retrain the model every time the app runs.
Faster App Start Time: Loading a pre-trained model is faster than training a model from scratch.
Reusable Models: The same model can be reused across different applications or projects, or even shared with others.

5. Experiment 2: Simple Regression Model

In this experiment, we’ll work with a regression problem using the Boston housing dataset from sklearn.datasets. We’ll train a simple Linear Regression model and then develop a Streamlit app to make predictions. Users will be able to input feature values and see the predicted output (housing prices), along with a line plot displaying the regression line.

Since the Boston dataset is deprecated, we can use the California Housing dataset (fetch_california_housing) for regression instead.

5.1 Loading the Dataset and Training a Linear Regression Model

We’ll begin by loading the California housing dataset, which is a well-known regression dataset for predicting house prices based on features like average income, house age, and so on.

import os
import joblib
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# ======================
# Exploratory Data Analysis (EDA)
# ======================

# Load the California Housing dataset
california = fetch_california_housing()

# Create a DataFrame for easier analysis
df = pd.DataFrame(data=california.data, columns=california.feature_names)
df['MedHouseVal'] = california.target  # Adding the target variable to the DataFrame

# Dataset overview
print("Dataset Overview:")
print(df.head(), "\n")

# Summary statistics
print("Summary Statistics:")
print(df.describe(), "\n")

# Checking for missing values
print("Missing Values:")
print(df.isnull().sum(), "\n")

# Visualizing distributions of features
plt.figure(figsize=(14, 8))
for i, feature in enumerate(df.columns[:-1]):
    plt.subplot(3, 3, i + 1)
    sns.histplot(df[feature], kde=True, bins=20, color='lightgreen')
    plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()

# Visualize correlations with the target variable (Median House Value)
plt.figure(figsize=(10, 6))
correlations = df.corr()
sns.heatmap(correlations[['MedHouseVal']].sort_values(by='MedHouseVal', ascending=False),
            annot=True, cmap='coolwarm', vmin=-1, vmax=1)
plt.title('Feature Correlations with Median House Value')
plt.show()

# ======================
# Splitting the Data and Model Training
# ======================

# Features (X) and target (y)
X = california.data
y = california.target

# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a simple Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# ======================
# Model Evaluation
# ======================

# Predict on the test set
y_pred = model.predict(X_test)

# Calculate performance metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"\nModel Performance:")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"R-squared (R2 Score): {r2:.4f}")

# ======================
# Save the Model and Feature Names
# ======================

# Define the path to save artifacts (model and feature names)
artifact_path = 'artifacts'
os.makedirs(artifact_path, exist_ok=True)  # Create the directory if it doesn't exist

# Save the trained model using joblib
model_path = os.path.join(artifact_path, 'california_linear_regression_model.pkl')
joblib.dump(model, model_path)
print(f"Model has been saved to '{model_path}'")

# Save feature names to a .txt file
feature_file_path = os.path.join(artifact_path, 'california_feature_names.txt')
with open(feature_file_path, 'w') as f:
    for feature in california.feature_names:
        f.write(f"{feature}\n")
print(f"Feature names saved to '{feature_file_path}'")

5.2 Building the Streamlit App

Next, let’s build a Streamlit app to allow users to input the independent variables (features of the house) and display the predicted house prices. We will also visualize the regression line on a plot.

import streamlit as st
import numpy as np
import pandas as pd
import joblib
import os
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# ==============================
# Loading the Pre-trained Model
# ==============================

# Load the pre-trained Linear Regression model saved earlier using joblib
# The model was trained on the California Housing dataset
# Define the path to the artifacts
artifact_path = 'artifacts'

# Load the trained model
model = joblib.load(os.path.join(artifact_path, 'california_linear_regression_model.pkl'))

#model = joblib.load('california_linear_regression_model.pkl')

# ==============================
# Load Dataset for Reference
# ==============================

# Fetch the California Housing dataset again for reference
# This is just for getting feature ranges and descriptions
california = fetch_california_housing()

# Create a DataFrame for easier data handling
df = pd.DataFrame(data=california.data, columns=california.feature_names)

# ==============================
# Streamlit App Interface
# ==============================

# Set the title for the web app
st.title("California Housing Price Predictor")

# Sidebar for user input
# We use sliders so the user can select different values for each feature of the dataset
st.sidebar.header('Input Features')

# Sliders for the user to input values for each of the features (Median Income, House Age, etc.)
# These values will be used as inputs for the model to predict the house price

# Median Income: Slider allows the user to input the median income in $10,000 increments
median_income = st.sidebar.slider('Median Income (10k USD)', 
                                  float(df['MedInc'].min()),  # minimum value from dataset
                                  float(df['MedInc'].max()),  # maximum value from dataset
                                  float(df['MedInc'].mean())) # default value (mean)

# House Age: Slider allows the user to input the age of the house
house_age = st.sidebar.slider('House Age (years)', 
                              float(df['HouseAge'].min()), 
                              float(df['HouseAge'].max()), 
                              float(df['HouseAge'].mean()))

# Average Rooms per Household: Slider for the average number of rooms per household
avg_rooms = st.sidebar.slider('Average Rooms per Household', 
                              float(df['AveRooms'].min()), 
                              float(df['AveRooms'].max()), 
                              float(df['AveRooms'].mean()))

# Average Bedrooms per Household: Slider for the average number of bedrooms per household
avg_bedrooms = st.sidebar.slider('Average Bedrooms per Household', 
                                 float(df['AveBedrms'].min()), 
                                 float(df['AveBedrms'].max()), 
                                 float(df['AveBedrms'].mean()))

# Population: Slider for the population in the area
population = st.sidebar.slider('Population', 
                               float(df['Population'].min()), 
                               float(df['Population'].max()), 
                               float(df['Population'].mean()))

# Average Occupancy: Slider for the average number of people per household
avg_occupancy = st.sidebar.slider('Average Occupancy', 
                                  float(df['AveOccup'].min()), 
                                  float(df['AveOccup'].max()), 
                                  float(df['AveOccup'].mean()))

# Latitude: Slider for the latitude (geographical location)
latitude = st.sidebar.slider('Latitude', 
                             float(df['Latitude'].min()), 
                             float(df['Latitude'].max()), 
                             float(df['Latitude'].mean()))

# Longitude: Slider for the longitude (geographical location)
longitude = st.sidebar.slider('Longitude', 
                              float(df['Longitude'].min()), 
                              float(df['Longitude'].max()), 
                              float(df['Longitude'].mean()))

# ==============================
# Prepare Input Data for Prediction
# ==============================

# We create an array with the input values provided by the user
# This input will be passed to the model for prediction
input_data = np.array([[median_income, house_age, avg_rooms, avg_bedrooms, population, avg_occupancy, latitude, longitude]])

# ==============================
# Make Prediction
# ==============================

# The trained model takes the input data and predicts the median house value
# The prediction result is returned in terms of 100,000s of USD, so we multiply it by 100,000 to display it in dollars
prediction = model.predict(input_data)

# Display the predicted price to the user
st.subheader('Predicted Median House Value')
st.write(f"Predicted Price: ${prediction[0]*100000:.2f}")  # Display in full dollar amounts

# ==============================
# Visualize Regression Line (Median Income vs House Price)
# ==============================

# This section will create a visual plot that shows how the model predicts house prices based on median income
# The red line shows the predictions made by the model, while the scatter points represent actual house prices in the dataset

# Subheader for the regression plot
st.subheader('Regression Plot: Median Income vs House Price')

# We generate a range of values for the median income from its minimum to its maximum
median_income_range = np.linspace(df['MedInc'].min(), df['MedInc'].max(), 100).reshape(-1, 1)

# For each value of median income, we keep the other input features constant (using the user's values for those)
# This allows us to isolate the effect of median income on the predicted house price
other_features = np.tile([house_age, avg_rooms, avg_bedrooms, population, avg_occupancy, latitude, longitude], (100, 1))
plot_input = np.hstack([median_income_range, other_features])  # Combine median income with the other fixed features

# Get the predicted house prices for the median income range using the model
predicted_prices = model.predict(plot_input)

# Plot the actual median house prices vs median income (scatter plot)
# The red line shows the predicted house prices based on median income
fig, ax = plt.subplots()
ax.scatter(df['MedInc'], california.target, label="Actual Prices", alpha=0.3)  # Actual data points
ax.plot(median_income_range, predicted_prices, color='red', label="Regression Line")  # Regression line (model predictions)

# Labeling the axes
ax.set_xlabel("Median Income (10k USD)")
ax.set_ylabel("Median House Price (in $100k)")
ax.legend()  # Add legend to distinguish between actual prices and regression line

# Show the plot in the Streamlit app
st.pyplot(fig)

This example demonstrates how to build a simple regression model and integrate it into a Streamlit app for real-time predictions and visualizations. By using a pre-trained model with joblib, you can streamline the process and focus on providing an interactive experience for users.

6. Experiment 3: Sentiment Classification with Huggingface

Sentiment analysis, a key task in Natural Language Processing (NLP), determines whether a text expresses positive, negative, or neutral sentiment. In this experiment, we focus on sentiment classification using a pre-trained Hugging Face model, specifically DistilBERT, fine-tuned on the SST-2 dataset. Hugging Face’s pre-trained models eliminate the need for time-consuming training and fine-tuning, making it easy to deploy high-performing models for tasks like sentiment analysis.

Building the Sentiment Classifier App with Streamlit

Below is the Python code to build a Streamlit app that uses the pre-trained DistilBERT model from Hugging Face for sentiment analysis.

import streamlit as st
from transformers import pipeline, AutoTokenizer
import torch

# ==============================
# Title and Description
# ==============================
st.title("Sentiment Analysis with Hugging Face")
st.write("This app uses a pre-trained DistilBERT model from Hugging Face for sentiment analysis. "
         "Enter some text, and click 'Predict' to analyze the sentiment.")

# ==============================
# Check for GPU Availability
# ==============================
device = 0 if torch.cuda.is_available() else -1
if device == 0:
    st.write("✅ GPU detected! Using GPU for faster processing.")
else:
    st.write("⚠️ No GPU detected. Using CPU for processing.")

# ==============================
# Load the Pre-trained Model
# ==============================
# Using the DistilBERT model fine-tuned for sentiment analysis (SST-2)
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=device
)

# Load the tokenizer with clean_up_tokenization_spaces set to True (to avoid warnings)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased", clean_up_tokenization_spaces=True)

# ==============================
# Input for User Text
# ==============================
user_input = st.text_area("Enter some text to analyze the sentiment:")

# ==============================
# Predict Button
# ==============================
if st.button("Predict"):
    if user_input:
        with st.spinner('Analyzing sentiment...'):
            # Perform the sentiment analysis when the button is clicked
            result = sentiment_pipeline(user_input)[0]

        # Display the result
        st.subheader("Sentiment Prediction")
        sentiment = result['label']
        confidence = result['score']
        
        if sentiment == "POSITIVE":
            st.success(f"Predicted Sentiment: {sentiment} 😊")
        else:
            st.error(f"Predicted Sentiment: {sentiment} 😞")
        
        st.write(f"Confidence: {confidence:.2f}")
    else:
        st.warning("Please enter some text to analyze the sentiment.")

# ==============================
# GPU Status Display
# ==============================
if device == 0:
    st.write("Model is running on GPU for fast predictions.")
else:
    st.write("Model is running on CPU.")

By leveraging DistilBERT, a lightweight version of BERT, we can classify sentiments efficiently without the need for extensive data or specialized hardware. This approach simplifies building real-world applications, enabling fast prototyping and deployment of machine learning solutions using Streamlit and Hugging Face.

6. Experiment 4: Image Classification

In this experiment, we’ll work with the MNIST dataset, a popular image dataset for digit classification, which contains 28x28 grayscale images of handwritten digits (0–9). We’ll train a Convolutional Neural Network (CNN) to classify these images and then build a Streamlit app where users can upload an image of a digit, and the app will classify it. Additionally, we’ll add a confidence bar chart for the top predictions.

6.1 Loading the MNIST Dataset and Training a CNN

We’ll begin by loading the MNIST dataset using tensorflow.keras.datasets and training a basic CNN.

import numpy as np
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

# Load the MNIST dataset
digits = load_digits()
X, y = digits.data, digits.target

# Preprocess the data
X = StandardScaler().fit_transform(X)
y = to_categorical(y)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the model
model = Sequential([
    Dense(64, activation='relu', input_shape=(64,)),
    Dense(32, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2, verbose=1)

# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test accuracy: {test_accuracy:.4f}")

# Save the model
model.save('mnist_model.h5')
print("Model saved as mnist_model.h5")

This script trains a basic CNN on the MNIST dataset and saves the model in HDF5 format (mnist_cnn_model.h5). We’ll use this pre-trained model in our Streamlit app to classify uploaded images of handwritten digits.

6.2 Building the Streamlit App

Next, we’ll build a Streamlit app that allows users to upload an image and get a classification result. We’ll also display the uploaded image and a confidence bar chart for the model’s top predictions.

import streamlit as st
import numpy as np
from PIL import Image
import tensorflow as tf
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

# Load the saved model
model = tf.keras.models.load_model('mnist_model.h5')

# Load the MNIST dataset to get the scaler
digits = load_digits()
scaler = StandardScaler().fit(digits.data)

st.title('MNIST Digit Recognizer')

uploaded_file = st.file_uploader("Choose an image...", type="png")

if uploaded_file is not None:
    image = Image.open(uploaded_file).convert('L')  # Convert to grayscale
    image = image.resize((8, 8))  # Resize to 8x8
    img_array = np.array(image).reshape(1, -1)  # Reshape to (1, 64)
    
    # Preprocess the image
    img_scaled = scaler.transform(img_array)
    
    # Make prediction
    prediction = model.predict(img_scaled)
    predicted_digit = np.argmax(prediction)
    
    st.image(image, caption='Uploaded Image', width=200)
    st.write(f"Predicted digit: {predicted_digit}")
    st.write(f"Confidence: {prediction[0][predicted_digit]:.2f}")

st.write("Note: For best results, upload a 28x28 pixel PNG image with a white digit on a black background.")

This example shows how to build an image classification app using Streamlit. By allowing users to upload images and get predictions with confidence scores, the app provides an interactive way to showcase the capabilities of a machine learning model for image classification.

7. ML-Streamlit Workflow Summary

The ML-Streamlit workflow consists of two main steps:

1. Train the Model: A machine learning model is developed and trained using libraries like scikit-learn, TensorFlow, or Keras. The trained model is then saved for future use, eliminating the need for retraining every time the app runs.

2. Create the Streamlit App: A Streamlit app is built to load the pre-trained model, allowing users to interact with it by providing inputs such as feature values or image uploads. The app makes real-time predictions and displays results through interactive widgets and visualizations.

This workflow simplifies the deployment of machine learning models, making them accessible, responsive, and easy to interact with in real time.

8. Best Practices for Streamlit Development

When building Streamlit applications, following best practices can lead to more maintainable, efficient, and user-friendly apps. Let’s explore some key areas:

8.1 Code Organization and Structure

Modularize your code: Separate your app into logical components.

# app.py
import streamlit as st
from data_processing import load_data
from visualization import plot_data

def main():
    st.title("My Streamlit App")
    data = load_data()
    st.write("Data Overview:", data.head())
    st.pyplot(plot_data(data))

if __name__ == "__main__":
    main()

# data_processing.py
import pandas as pd

def load_data():
    return pd.read_csv("data.csv")

# visualization.py
import matplotlib.pyplot as plt

def plot_data(data):
    fig, ax = plt.subplots()
    data.plot(ax=ax)
    return fig

2. Use functions for reusable components: Create functions for UI elements you use repeatedly.

def sidebar_filters():
    st.sidebar.header("Filters")
    category = st.sidebar.selectbox("Category", ["A", "B", "C"])
    date_range = st.sidebar.date_input("Date Range")
    return category, date_range

8.2 Performance Optimization Tips

Use caching: Cache computationally expensive operations.

@st.cache_data
def load_large_dataset():
    # This function will only be executed once, and its result cached
    return pd.read_csv("large_dataset.csv")

2. Lazy loading: Load data or perform computations only when necessary.

if st.checkbox("Show data"):
    data = load_large_dataset()
    st.write(data)

By following these best practices, you’ll create more organized, efficient, and interactive Streamlit applications. Remember to always consider your app’s specific requirements and user needs when implementing these practices.

9. Deploying Streamlit Apps

Deploying your apps to the internet allows users to access them from a web browser — without having to set up a coding environment and installing dependencies.

You have two options:

Manually set up a virtual private server for deploying your app.
Host the app in a GitHub repository and deploy it to a cloud platform.

10. Conclusion

10.1 Recap of What We’ve Learned

Throughout this guide, we’ve explored the integration of machine learning models with Streamlit to build interactive applications. We covered three key experiments:

1. Classification with the Iris Dataset: We trained a Random Forest classifier and created a Streamlit app that allows users to input feature values and get real-time classification results, accompanied by visualizations.

2. Regression with the California Housing Dataset: We trained a Linear Regression model to predict housing prices and built a Streamlit app where users can input housing features and see the predicted prices, along with a regression plot.

3. Image Classification with the MNIST Dataset: We trained a CNN to classify handwritten digits and developed a Streamlit app that allows users to upload images for classification, displaying both the prediction and a confidence chart.

These experiments demonstrated how to quickly deploy machine learning models using Streamlit, creating engaging and interactive experiences for users without the need for complex web development.

10.2 Potential Next Steps and Advanced Topics

As you continue to explore machine learning deployment with Streamlit, here are a few potential next steps:

Advanced Model Deployment: Explore deploying more complex models like deep learning architectures or ensemble models. You could also experiment with deploying NLP models, time-series forecasting, or reinforcement learning applications.
Performance Optimization: Implement caching mechanisms in Streamlit to speed up the loading of models and optimize performance for large datasets or real-time applications.
Handling App State: Learn how to maintain session state within Streamlit apps, allowing users to save progress or inputs as they interact with the application.
Model Interpretability: Integrate explainability tools like SHAP or LIME into your Streamlit apps to help users understand how the model makes its predictions.
Deployment Platforms: Explore cloud deployment options like Streamlit Cloud, Hugging Face Spaces, or using platforms like Heroku or AWS for larger-scale apps.

10.3 Additional Resources for Learning Streamlit

To dive deeper into Streamlit and continue honing your skills, here are some useful resources:

- Official Streamlit Documentation: [Streamlit Docs] — The official guide and documentation for Streamlit, with examples and API references.

- Streamlit GitHub: [Streamlit GitHub Repo] — Explore the source code and contribute to the Streamlit open-source community.

- Streamlit Discourse Forum: [Streamlit Forum] — A community forum where you can ask questions, find tutorials, and share your Streamlit projects.

- YouTube Tutorials: There are many Streamlit-focused channels on YouTube that offer beginner to advanced tutorials, covering various machine learning and data science projects.

By building upon the foundational experiments covered here, you can unlock the full potential of Streamlit for deploying machine learning models and creating engaging, interactive applications.