Easiest Way to build your Data Science Web app

Paridhi Parajuli
Analytics Vidhya
Published in
5 min readAug 4, 2020

It hasn’t been even a month since I got to know about this super easy way of developing Machine Learning Web apps. I have developed various ML models but have always troubled making it look presentable, more like a product or a MVP.

Streamlit is a python framework where you can easily build an interactive web application with just a few lines of code. Data scientists are now Web developers with Streamlit!

It is very easy to get started with. You just have to install the package through pip. You can run a sample streamlit app using streamlit hello to know how it looks like.

pip install streamlit
streamlit hello
streamlit run main.py

In my main.py :

import streamlit as st
st.title(“Poly Cystic Ovarian Symdrome Prediction”)
st.sidebar.title(“PCOS Prediction”)
st.sidebar.markdown(“This is for changing the model parameters”)

The change in the python script is scanned each time when it is saved and it does live changes in the web app as well while you code. Run the file as : streamlit run main.py and the output is :

There you go!

I’ll be explaining with a demo project on Polycystic Ovarian Syndrome Prediction. PCOS is a very common disease in women these days where the ovaries develop some kind of cysts that may cause infertility. The dataset can be obtained from here . The dataset is divided into two tables:

  1. Without infertility
  2. With infertility

Therefore, we need to join these two tables on a common column and drop the features which are not much correlated with our output. Also we’ll be requiring some data clean up.

@st.cache(persist=True)
def load_data():
no_inf=pd.read_csv(“C:/Users/dell/Documents/PCOS Prediction/no_inf.csv”)
inf=pd.read_csv(“C:/Users/dell/Documents/PCOS Prediction/inf.csv”)
data = pd.merge(no_inf,inf, on=’Patient File No.’, suffixes={‘’,’_y’},how=’left’)
data[‘Fast food (Y/N)’].fillna(data[‘Fast food (Y/N)’].median(),inplace=True)
data.drop([‘PCOS (Y/N)_y’,’AMH(ng/mL)_y’,’Patient File No.’,’Unnamed: 42'],axis=1,inplace=True)
corr_features=data.corrwith(data[“PCOS (Y/N)”]).abs().sort_values(ascending=False)
corr_features=corr_features[corr_features>0.25].index
data=data[corr_features]
return data
df=load_data()

The function load_data() does the work of loading the data, cleaning and feature selection as well. This work is to be done only once. When you re-run the script the load_data () is again executed. But this can be avoided by using the @st.cache decorator that will cache the app so that you can avoid expensive repeated computations like training or loading of huge datasets. Here, we are going to predict if a woman has PCOS or not based on her symptoms like weight gain, acne, irregular periods and so on. We are only selecting those features that are very closely correlated to our output feature that is PCOS (Y/N) .

Now, let’s move to the Classifier. Since we’re creating an interactive dashboard, we need to have multiple classifiers so that users can find the best classifier.

st.sidebar.subheader("Choose Classifier")
classifier = st.sidebar.selectbox("Classifier", ("SVM", "Random Forest"))
if classifier == 'SVM':
st.sidebar.subheader("Model Hyperparameters")
C = st.sidebar.number_input("Regularization (C)", 0.01, 10.0, step=0.01, key='C_SVM')
kernel = st.sidebar.radio("Kernel", ("rbf", "linear"), key='kernel')
gamma = st.sidebar.radio("Gamma (Kernel Cofficient)", ("scale", "auto"), key='gamma')
metrics = st.sidebar.multiselect("Plot Metrices", ('Confusion Matrix', 'ROC Curve', 'Precision-Recall Curve'))
if st.sidebar.button("Classify", key='classify'):
st.subheader("Results of SVM")
model = SVC(C=C, kernel=kernel, gamma=gamma)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write("Accuracy: ", accuracy.round(2))
st.write("Precision: ", precision_score(y_test, y_pred, labels=class_names).round(2))
st.write("Recall: ", recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)
if classifier == 'Random Forest':
st.sidebar.subheader("Model Hyperparameters")
n_estimators = st.sidebar.number_input("The number of trees in the forest", 100, 5000, step=10, key='n_estimators')
max_depth = st.sidebar.number_input("The maximum depth of the tree", 1, 20, step=1, key='n_estimators')
bootstrap = st.sidebar.radio("Bootstrap samples when building trees", ('True', 'False'), key='bootstrap')
metrics = st.sidebar.multiselect("What metrics to plot?", ('Confusion Matrix', 'ROC Curve', 'Precision-Recall Curve'))
if st.sidebar.button("Classify", key='classify'):
st.subheader("Random Forest Results")
model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, bootstrap=bootstrap, n_jobs=-1)
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
st.write("Accuracy: ", accuracy.round(2))
st.write("Precision: ", precision_score(y_test, y_pred, labels=class_names).round(2))
st.write("Recall: ", recall_score(y_test, y_pred, labels=class_names).round(2))
plot_metrics(metrics)

The st.sidebar.selectbox(“Classifier”, (“SVM”, “Random Forest”)) shows up a radio button that allows you to choose between a SVM classifier and Random Forest classifier. For each classifier you can also set changeable parameters. You can input your desired parameter for the classifier through these lines of code:

C = st.sidebar.number_input("Regularization (C)", 0.01, 10.0, step=0.01, key='C_SVM')
kernel = st.sidebar.radio("Kernel", ("rbf", "linear"), key='kernel')
gamma = st.sidebar.radio("Gamma (Kernel Cofficient)", ("scale", "auto"), key='gamma')

And send these parameters when you train the classifier as :

model = SVC(C=C, kernel=kernel, gamma=gamma)
model.fit(x_train, y_train)

In order to find the accuracy, you’ll need to create a train test split. The split is to be done only once that’s why we’re again caching this part of computation. Let’s make the split function as:

@st.cache(persist=True)
def split(df):
y = df['PCOS (Y/N)']
x=df.drop(columns=['PCOS (Y/N)'])
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
return x_train, x_test, y_train, y_test
x_train, x_test, y_train, y_test = split(df)

This particular part of code below does the job of accuracy computation and the actual prediction. The argumet metrices in the plot_metrics function is defined before as a list of accuracy visualizations that user wants to have.

metrics = st.sidebar.multiselect("Plot Metrices", ('Confusion Matrix', 'ROC Curve', 'Precision-Recall Curve'))accuracy = model.score(x_test, y_test)
y_pred = model.predict(x_test)
plot_metrics(metrics)

Let’s define the plot_metrics() function as :

def plot_metrics(metrics_list):
if 'Confusion Matrix' in metrics_list:
st.subheader("Confusion Matrix")
plot_confusion_matrix(model, x_test, y_test, display_labels=class_names)
st.pyplot()
if 'ROC Curve' in metrics_list:
st.subheader("ROC Curve")
plot_roc_curve(model, x_test, y_test)
st.pyplot()

if 'Precision-Recall Curve' in metrics_list:
st.subheader('Precision-Recall Curve')
plot_precision_recall_curve(model, x_test, y_test)
st.pyplot()

Therefore, you can choose classifiers, change predictions, and visualize the accuracy metrics. You can add other models too in the similar way.

Taking Symptoms as input from user

skin=st.number_input("Have you experienced skin darkening?(0/1)", key='skin')
hair=st.number_input("Have you experienced hair growth?(0/1)", key='hair')
weight=st.number_input("Have you experienced weight gain?(0/1)", key='weight')

ff=st.number_input("Do you eat too much fast foods?(0/1)", key='ff')
pimple=st.number_input("Do you get pimples?(0/1)", key='pimple')
lf = st.number_input("No of left follicles", key='lf')
rf=st.number_input("No of right follicle", key='rf')
cycle=st.number_input("Cycle(R/I)", key='cycle')
my_data=[[lf,rf,skin,hair,weight,cycle,ff,pimple]]
pd.DataFrame.from_dict(my_data)
if st.button("Predict", key='predict'):
st.subheader("Your Result:")
model = SVC(C=0.01, kernel="linear", gamma="auto")
model.fit(x_train, y_train)
accuracy = model.score(x_test, y_test)
y_pred = model.predict(my_data)
if y_pred==1:
st.write("Alert!!You are predicted to have PCOS")
else:
st.write("Congratulations!!You are predicted negative for PCOS")

st.write("The prediction made has Accuracy: ", accuracy.round(2))

The final output looks like:

Using SVM Classifier
Using the Random Forest Classifier
For user’s input

The overall project can be downloaded from my github repo.

Thanks for reading!

--

--