How I built my first web-app project with deployed machine learning model in 48 hours

Patrick Chodowski
11 min readJan 5, 2020

--

Final state of wouldyousurvivetitanic.info

Introduction

(This paragraph answers the “but why?” question and you can skip it if you are not interested in the background story. There is tl;dr part at the end of each paragraph)

If you want to visit the website, it’s here. Github repo though, is here.

It was early Saturday morning and I remember it very well because I couldn’t sleep. Woke up just before 5AM after maybe 3 hours of sleep, not feeling too well, but also not feeling too sleepy. Normally on days like this I would just stay in bed and wait, hoping to get some more hours of sleep, however this time I just didn’t feel like it. In the sudden surge of hyper-activity, I made myself a coffee and sat down to my desk. I decided to make the best of long day the fate presented me and learn something new.

I spent last 5 years working with SQL, R and Python (in this order) and I can proudly say I have plenty of BI and Data science projects under my belt. I absolutely love my job and I am passionate about mastering my data science skills. I still think one of the best features of working with data is the fact that you will never stop learning. There is always something new. There is always something you don’t know. It keeps man hungry.

This, however, creates a certain trap. Once you commit to only one - vast, but one - area of study, you are likely to be stuck there for a very long time, giving zero to none attention to other skills, that can be also useful or interesting. In short - you can improve as a data scientist without doing data science projects. I will speak from experience: you can study chess - it will improve your memory and logical thinking. You can learn any other programming language than R or Python - maybe OOP approach will help you write better code for machine learning etc. Try painting and you will be better at data visualisation. There are just so many ways to improve.

With that in mind I decided to build a simple web-app from scratch. I had some experience with setting up personal blogs (mostly powered by R libraries like Rblogdown) and some flask APIs, however the truth is brutal. My heart rate skyrockets every time I have to adjust website elements. Opening CSS files made me nervous. Trying to add custom HTML tags to my Shiny app feels similar to playing Battleship. I guess you got the point.

tl;dr: I am a data scientist and I am building a web app because I haven’t done it before

The Goal

Write a website from scratch i.e. not using any HTML/CSS/Flask templates, that has to be somehow connected to machine learning world, it should be finished in one weekend.

tl;dr:

  • no html/css/templates
  • data scientish theme
  • 48 hours to finish

The Plan

I sat down and summarised what I have and what I don’t have

I do have:

  • Python
  • Bits of Flask
  • I know how Namecheap and Digital Ocean work
  • Internet

I don’t have:

  • html
  • css
  • docker skills
  • nginx skills

Step 1: Do HTML/CSS course for beginners

I followed this:

This course is fantastic. I followed HTML and CSS parts, got to know Visual Studio Code, was able to grasp some concepts of simple website. I also conveniently skipped javascript part.

I reckon it requires around 2–4 hours to follow the course, doing your own example in the meantime. Trying out and googling more information for your concept takes more time than strictly following the course, but at least to me it feels more rewarding.

Before breakfast I had some simple html page with a CSS style that didn’t make me feel sad.

It doesn’t make me feel sad, but my standards were not high

At this point I realised that it doesn’t take much time to build simple, not bad looking html page. It also takes much more time to actually make it look decent.

After couple of hours of tweaking and learning (also added some images) I got something that looks very close to the final frontend:

I am still very pleased with the final view

By this time I knew I want this website to offer deployed machine learning model in one tab and some data exploratory tool in the other. There is third tab called blog for decorative purposes.

Step 2: pour from flask

Since I wanted to offer some more functionality than just displaying sinking ship in the logo, I needed back-end logic. I already spent plenty of my time tuning and tweaking the front end part and I had to go with something I already know a bit — flask.

Logic to implement:

  • Connection between the user and deployed machine learning model: Users will fill the form with their data (I am not storing anything) and prediction model will provide the prediction
Prediction form
Sad but true
  • Offer connection to simple Dash dashboard that will provide exploratory data analysis
Dash dashboard inside flask web-app

Flask’s app.py file looks like this:

import osos.chdir('./app')from flask import Flask, render_template, jsonify, request, flash, redirect, url_for
from joblib import load
import pandas as pd
from app.class_form import ModelForm
from sklearn.preprocessing import _data
import plotly.express as px
import dash
import dash_html_components as html
import dash_core_components as dcc
from dash.dependencies import Input, Output
from app.explain_module import titanic_df
import dash_dangerously_set_inner_html
titanic_app = Flask(__name__)
SECRET_KEY = os.urandom(32)
titanic_app.config['SECRET_KEY'] = SECRET_KEY
titanic_app.config["FLASK_DEBUG"] = 1
model = load(open('best_titanic_predictor.pkl', 'rb'))
def pred_row(request_form):
pdata = {'pclass': int(request_form.getlist('pclass')[0]),
'sex': request_form.getlist('sex')[0],
'age': request_form.getlist('age')[0],
'sibsp': request_form.getlist('sibsp')[0],
'parch': request_form.getlist('parch')[0],
'fare': request_form.getlist('fare')[0],
'cabin': request_form.getlist('cabin')[0],
'embarked': request_form.getlist('embarked')[0]}
return pdata
@titanic_app.route('/')
@titanic_app.route('/index', methods=['GET', 'POST'])
def index():
return render_template('index.html')
@titanic_app.route('/predict_survive', methods=['POST'])
def predict_survive():
r_data = request.form
data = pred_row(r_data)
user_name = r_data.getlist('name')[0]
data_df = pd.DataFrame(data, index=[0])
result = int(model.predict(data_df)[0])
#result = 1
if result == 1:
msg = f"{user_name} survived Titanic crash"
else:
msg = f"{user_name} did not survive Titanic crash"
flash(msg)
return redirect(url_for("predict"))
@titanic_app.route('/predict', methods=['GET', 'POST'])
def predict():
form = ModelForm()
return render_template('predict.html', form=form)
#### DASH APP #####dash_app = dash.Dash(__name__, server=titanic_app, routes_pathname_prefix='/explain/')
col_options = [dict(label=x, value=x) for x in titanic_df.columns]
dimensions = ["x", "y", "color", "facet_col", "facet_row"]
dash_app.layout = html.Div(
[
#html.H1("Titanic express analysis"),
dash_dangerously_set_inner_html.DangerouslySetInnerHTML('''
<header>
<nav>
<ul>
<li class="logotyp"><a href="/">Home</a></li>
<!--li><a href="/#about_me">About me</a></li-->
<!--li><a href="/#about_project">About project</a></li-->
</ul>
</nav>
</header>
'''),
html.Div(
[
html.P([d + ":", dcc.Dropdown(id=d, options=col_options)])
for d in dimensions
],
style={"width": "15%", "float": "left", "margin-right": "20px", "margin-left": "20px", "color": "black"},
),
dcc.Graph(id="graph", style={"width": "60%", "display": "inline-block", "textAlign": "center"}),
]
)
@dash_app.callback(Output("graph", "figure"), [Input(d, "value") for d in dimensions])
def make_figure(x, y, color, facet_col, facet_row):
return px.scatter(
titanic_df,
x=x,
y=y,
color=color,
facet_col=facet_col,
facet_row=facet_row,
height=650,
)
#### RUN ######if __name__ == '__main__':
# app.run()

Let me go into more detail:

Predictions

@titanic_app.route('/predict_survive', methods=['POST'])
def predict_survive():
r_data = request.form
data = pred_row(r_data)
user_name = r_data.getlist('name')[0]
data_df = pd.DataFrame(data, index=[0])
result = int(model.predict(data_df)[0])
#result = 1
if result == 1:
msg = f"{user_name} survived Titanic crash"
else:
msg = f"{user_name} did not survive Titanic crash"
flash(msg)
return redirect(url_for("predict"))
@titanic_app.route('/predict', methods=['GET', 'POST'])
def predict():
form = ModelForm()
return render_template('predict.html', form=form)

/predict route is simply the page with the form users fill in the form and ask for prediction. Once they click the “did I survive” button, a HTTP POST request is sent to /predict_survive route, where data is read and fed to the machine learning model. Model’s output is flashed back to the user.

This was the first time for me with flask-wtf forms but went pretty smoothly thanks to this post . Below is the python code for the form class:

from flask_wtf import FlaskForm
from wtforms import StringField, PasswordField, BooleanField, SubmitField, SelectField, IntegerField, FloatField
from wtforms.validators import DataRequired, NumberRange
class ModelForm(FlaskForm):name = StringField('Name', validators=[DataRequired()])
pclass = SelectField('Passenger class', choices=[('1', '1st class'), ('2', '2nd class'), ('3', '3rd class')], validators=[DataRequired()])
age = IntegerField('Age', [NumberRange(min=0, max=80)])
sex = SelectField('Gender', choices=[('male','M'), ('female','F')], validators=[DataRequired()])
sibsp = IntegerField('Number of siblings or spouses on board', [NumberRange(min=0, max=8)])
parch = IntegerField('Number of parents or children on board', [NumberRange(min=0, max=6)])
fare = FloatField('Ticket fare', [NumberRange(min=0, max=550)])
cabin = SelectField('Cabin deck', choices=[('A', 'A'), ('B', 'B'), ('C', 'C'), ('D', 'D'), ('E', 'E'), ('F', 'F'), ('G', 'G'), ('T', 'T')], validators=[DataRequired()])
embarked = SelectField('Embarked in', choices=[('C','Cherbourg'), ('Q','Queenstown'), ('S','Southampton')], validators=[DataRequired()])
submit = SubmitField('Did I survive???')

And also the HTML template to add the forms to the site:

{% extends "base.html" %}{% block content %}<section class="predict_form"><form action="{{ url_for('predict_survive') }}" method="post" novalidate>{{ form.hidden_tag() }}
{{ form.csrf_token }}
<p>
{{ form.name.label }}<br>
{{ form.name() }}
{% for error in form.name.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.pclass.label }}<br>
{{ form.pclass() }}
{% for error in form.pclass.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.sex.label }}<br>
{{ form.sex() }}
{% for error in form.sex.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.age.label }}<br>
{{ form.age() }}
{% for error in form.age.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.sibsp.label }}<br>
{{ form.sibsp() }}
{% for error in form.sibsp.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.parch.label }}<br>
{{ form.parch() }}
{% for error in form.parch.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.fare.label }}<br>
{{ form.fare() }}
{% for error in form.fare.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.cabin.label }}<br>
{{ form.cabin() }}
{% for error in form.cabin.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>
{{ form.embarked.label }}<br>
{{ form.embarked() }}
{% for error in form.embarked.errors %}
<span style="color: red;">[{{ error }}]</span>
{% endfor %}
</p>
<p>{{ form.submit() }}</p>
</form>
</section>{% endblock %}

Dash App

What I did for EDA part of the app, was to implement very basic Dash app, based on plotly express package.

dash_app = dash.Dash(__name__, server=titanic_app, routes_pathname_prefix='/explain/')
col_options = [dict(label=x, value=x) for x in titanic_df.columns]
dimensions = ["x", "y", "color", "facet_col", "facet_row"]
dash_app.layout = html.Div(
[
#html.H1("Titanic express analysis"),
dash_dangerously_set_inner_html.DangerouslySetInnerHTML('''
<header>
<nav>
<ul>
<li class="logotyp"><a href="/">Home</a></li>
<!--li><a href="/#about_me">About me</a></li-->
<!--li><a href="/#about_project">About project</a></li-->
</ul>
</nav>
</header>
'''),
html.Div(
[
html.P([d + ":", dcc.Dropdown(id=d, options=col_options)])
for d in dimensions
],
style={"width": "15%", "float": "left", "margin-right": "20px", "margin-left": "20px", "color": "black"},
),
dcc.Graph(id="graph", style={"width": "60%", "display": "inline-block", "textAlign": "center"}),
]
)
@dash_app.callback(Output("graph", "figure"), [Input(d, "value") for d in dimensions])
def make_figure(x, y, color, facet_col, facet_row):
return px.scatter(
titanic_df,
x=x,
y=y,
color=color,
facet_col=facet_col,
facet_row=facet_row,
height=650,
)

After clicking on /explain route user lands on what is basically a dash app disguised as a web application. The only tricky part was to add the right header to match the rest of the website. The only solution I found was to use dash_dangerously_set_inner_html library and yes, it sounds sketchy. When author of the module literally calls it a dangerous module, then you have to trust them. However that was the only option I found. At least it looks ok.

Step 3: Training Machine Learning Model

This was the easiest and the most obvious part for me. Titanic problem is well documented across the internet and my intention was not to reinvent the wheel in that matter. I used knowledge from here and here to obtain the model that scores 80% on kaggle. For the sake of this project it’s completely fine. Best estimator was saved as pickled file and is loaded by the app:

model = load(open('best_titanic_predictor.pkl', 'rb'))

This year I actually started using classes and objects to write machine learning code. I have to say I am pretty happy with how my code looks after the transmission from purely functional programming. On Github you can find how I used python’s class in that case.

Step 4: Deploying the ML Flask app on Digital Ocean, nginx and docker

This part took me much more time than I would like it to. It’s worth mentioning that I did something similar before (setting up shiny-server on Digital Ocean following greatest tutorial ever). However, the addition of Python, flask, machine learning model, gunicorn and docker made things more complicated, if not intimidating.

What have I done step by step:

  • Getting Digital Ocean’s ubuntu instance, installing python, docker, nginx and all dependencies, alongside with Rstudio Server which is, and always will be, my favourite tool to work on remote linux server. If you are a Pythonista for life - I am really sorry, but I enjoy chaos.
  • Getting the domain name from namecheap.com . I chose the cheapest one that doesn’t sound ridiculous: woudyousurvivetitanic.info. Make sure to setup the DNS to your DO server correctly
  • Trying to understand the docker stuff. Once you get how the docker works, then it becomes simple. So after a longer while I came up with this dockerfile:
FROM python:3.7ADD . /app
WORKDIR /app
RUN pip install -r ./app/requirements.txtEXPOSE 8000
CMD ["gunicorn", "-b", "0.0.0.0:8000", "app.app:titanic_app"]

Which looks funny, considering how much time it takes to get there. What’s even more crucial, it’s the commend you use to debug your dockerized app:

docker run -p 90:8000 titanic_app

So I found myself rebuilding docker image couple of times, and then running the command above in order to check if it works correctly this time.

Command for building the app:

sudo docker build --tag titanic_app .

And I run it from inside of the app directory

After building the app, you can run it with

docker run --detach -p 85:8000 titanic_app

Dockerized app will be available on port 85 (make sure it’s opened). Last step is making nginx redirect to that port anytime wouldyousurvivetitanic.info is requested.

  • Setting up nginx service. If you - like me - didn’t have much experience with networks, routes, proxys etc. It will take another while till you find your way around it. This is how my nginx setup file (file is named wouldyousurvivetitanic.info ) in /etc/nginx/sites-available looks like:
server { listen 80; 
server_name www.wouldyousurvivetitanic.info;
rewrite ^/(.*) http://wouldyousurvivetitanic.info/$1 permanent;
}
server {
listen 80;
server_name wouldyousurvivetitanic.info;
access_log /var/www/wouldyousurvivetitanic.info/logs/access.log; error_log
/var/www/wouldyousurvivetitanic.info/logs/error.log;
location /
{
#root /var/www/wouldyousurvivetitanic.info/public/;
#index index.html;
proxy_pass http://127.0.0.1:85;
}
}

Last step is adding reference to this file to /etc/nginx/sites-enabled directory. After nginx server restart your site should be finally available online.

Summary

It’s hard to go through every detail of this project, as it covered plenty of areas joined together. I wanted to write this post to summarise what I have done and what I have learnt in the span of 48 hours. It is possible to learn basics of something new and add it to your current skillset in short time. It’s fun, it can be useful and it’s always good to give your brain something completely new to understand.

What has been done in 48 hours:

  1. Completed HTML+CSS course for beginners
  2. Built basic HTML frontend
  3. Built machine learning model on titanic dataset
  4. Built backend logic in Python+Flask: added forms, basic dash app and link to machine learning model
  5. Deployed website using gunicorn and docker. Setup the server with nginx on digital ocean.

Please find the website here

And github repo with the code here

--

--

Patrick Chodowski

Data scientist with over 5 years of experience. Interested in sports data analysis. Python and R are both great tools. Personal blog: http://per48.co