Bringing Your ML Models to Life With Flask

data4help

Published in

The Startup

11 min readNov 13, 2020

A practical introduction to machine learning model deployment in Python

What is deployment, and why do I need to do it?

Congratulations, you’ve trained a machine learning model! You’ve worked hard for the past few months exploring the data, completing analysis, creating training features, and finally training models. You used cross validation and optimized hyperparameters, and when you show your boss and other stakeholders from the business, they agree with the your choice of performance metric. You excitedly show them charts and visualizations of the test score of your model, and they agree that your model is making good predictions for the task at hand.

Then they ask how the model will perform on real data, in real time. How will it contribute to business value? How will the predictions be delivered? At the moment, your newly-trained model lives on your local machine inside the Jupyter Notebook used to train it. As a Data Scientist, your focus has been on tweaking your model and improving its performance. You haven’t thought about how your model will move from the development stage to the operational stage. Enter: deployment.

Deployment is the process of bringing your model to life. When you deploy your model, you free it from the confines of your local machine and historical data. You make it possible for other people to access your model, and for your model to make predictions on new, unseen data, in real time.

Through the process of deployment, the trained model moves into the operational phase and makes predictions on new data. Those predictions are returned to other services, like apps, websites, or databases.

How do I deploy my model?

Model Example

In this section, we’ll walk through the steps needed to deploy a machine learning model. Throughout this tutorial, we’ll use an example of a model for predicting customer churn. Customer churn is a measure of customer loyalty: if a customer “churns”, they leave your company and go to a competitor.

We start by training a logistic regression binary classification model to predict if a customer churns using a dataset from Kaggle. This Kaggle challenge was about predicting customer churn for a telephone service provider.

The table below shows the data included in this dataset, and a description of the columns. The final column, “Churn”, is the target value.

Description of the features used to train our churn prediction model example

In the next steps, we walk through how to update the code that was originally written to train this model in order to deploy it.

Step 1: Saving the model

The first step in deploying a trained machine learning model is to save it. You can think of this step like saving a Word document. When you save a word document, it no longer lives in the “purgatory” zone — it has a file name and format, and with this file name and format (.doc), and it can be shared and opened by anyone who also has Microsoft Word, or any other program that is capable of reading .doc files.

We know that .doc is a common file format for saving text documents. But what file format should be used for saving machine learning models? The answer is a pickle file, which has the extension .pkl. Pickling is a file saving format specifically made for python objects that saves all of the attributes of the object in a way that can be opened and read again by Python in the future. In the context of your machine learning model, the attributes that are saved when it is converted to a pickle include things like model weights and the values of the various hyperparameters.

To see how pickling works, we’ll use the model we trained to predict churn.

The following code shows how to prepare the data, train the model, and finally to pickle this model:

Now that we’ve pickled the model, you may think we’re ready to deploy it. Not quite. We need to keep in mind any stateful transformers we used in our pre-processing steps. A stateful transformer is one where the transformation depends on prior data. For example, if we want to use mean encoding to fill in a categorical column with its mean value to use as a feature, this encoding is calculated using the values of the all prior datapoints. In order to apply this transformation to new values, we need to save the mean for each category that was calculated using that past data.

Another way of thinking about stateful transformations is to think of them as any transformer that requires first being “fit” to the training data, like scaling or normalizing the data, or for changing column values via encoding. For this model, the only stateful pre-processing we’ve done is converting the categorical columns to a model-readable format using a category encoder.

Now that we have saved all of the trained and fitted stateful transformers and model itself, we have everything we need in order to make predictions on new, unseen data. The saved stateful transformers will be used to pre-process these new data points and get them into exactly the same format as the training data that was originally used to train the model.

Step 2: Refactoring code for the operational phase

So far, the code we have written completes the task training and saving a model as a pickle. This code completed the tasks that are required during the training phase. Though much of this code is applicable in the operational phase, it is not entirely sufficient, and some changes will need to be made. For example, our code for training the model relied on reading in one CSV which contained all of the training data. In operational mode, we won’t have one single CSV containing all the data. Rather, we will need to predict many times on single datapoints.

What happens in the operational phase is shown in the image below. A datapoint representing the features for a new customer is fed into our trained, saved model, and a prediction of whether or not they will churn is returned. The prediction comprises two parts: a binary prediction of whether or not the person will churn (0 = no churn, 1 = churn), and a probability representing the probability that they will churn.

Prediction made on new data in the operational phase, using the pre-trained model.

These predictions, as well as the input features used to make them, will no longer come in a CSV format, but rather as a JSON object. Why a JSON object? JSON objects are easy to pass between different applications, and can also easily be read into a pandas dataframe, just like the data we originally used to train our model.

We will also need to change how predictions from our model are returned. In the code we used to train the model, we returned all the predictions from the training and validation sets and used them to calculate performance metrics like accuracy. In the operational phase, we will need to return each individual prediction in real time. We will also no longer need to calculate scores like accuracy.

The following code shows the .py file for doing the main steps needed in the operational phase:

Note that in this prediction script, we load the pickled models and transformers and apply them to new, unseen data in JSON format.

We now know that we can apply our trained model to a new datapoint in JSON format and get a prediction!

Step 3: Creating a prediction web service with Flask

Now that we have updated the prediction input and return formats to JSON and incorporated loading in the saved model and transformer, the next step in our deployment process is to make our predictor available. This means both available to receive new datapoints with which to make predictions, and available to supply these predictions to some other application that will use them. This is where we will use Flask to transform our prediction script from a local script that can only run on our machine, to a kind of prediction factory. When we use Flask to transform our prediction script from a script to a web service, it’s like opening the factory doors. Now, we no longer simply make predictions on local data inside our closed factory. Rather, we now accept requests for new predictions from outside “customers”, and then provide them with predictions.

The most famous example of a web service is a web server. A web server can be thought of as a webpage-producing factory that takes requests all day long for different web pages (in the form of a URL), and then returns those requests (in the form of HTML code that renders in your browser as a pretty webpage). Our web service will work much the same way, but instead of taking requests in the form of different URLs, it will take requests as a JSON, and instead of returning the HTML needed to show a webpage, it will return a JSON of predictions.

So how do we open the doors to our factory? As the name suggests, transforming our script into a web service involves making it available over the internet. The code below shows the changes that were made to the prediction script to turn it into a web service:

The first thing we notice is that we instantiated a Flask app, by making an instance of the Flask class. This is the first step for opening up our “factory” — you can think of it like getting the necessary business permits. The next step is to set a URL that will trigger our prediction function. Keeping with our factory example, this is like the address to which our customers will send their requests for predictions. We do this with the app.route() decorator from flask, which is placed just above our prediction function in the code.

If we want our factory to produce different products, we can specify different addresses for the different types of “orders”. The link we specify in the app.route() decorator placed just above the function defines the address where the requests for the product produced by that particular function should go.

Finally, we change how we get the input data, and how we return our predictions. Before, in our predictions script, we just loaded in a JSON file from our local machine. Now, we want to accept JSONS that are inside a request. We have to use the requests package from Python to parse the incoming JSON our of the incoming request. We then also return a JSON again on the way out. You can note these changes in the generate_predictions() function.

The final change is to the last part of the script. It now simply runs the app, rather than calling the generate_predictions() function directly. In doing so, it still calls the generate_predictions function, since this is the function that is triggered when the app starts.

Now that we have our server script, we can test it on our local machine by sending it a test request.

The first step is simply to run our new python web service script from the command line. When we run it, it starts up the Flask app, calling our web service on a local port, meaning it just runs on our local machine.

Screenshot of server running after running our server script

Now, we can send a request using cURL. cURL is a command line utility used to send requests from the command line to any URL.

In this case, the URL we are using is a local port, with the endpoint ‘/predict’, which we defined in our server script. The ‘-X’ part of the command comes before a statement of which exact type of request we are going to send. In this case, we specified in our server script that the request type would be a simple “post” request, so that’s what we also specify here in our request. You can read more about other types of requests here. The ‘-d’ part of the command signals that what follows is the data, or the body of the request. After this ‘-d’ signal, we pass in the JSON of features that we want our model to use to make the prediction. Finally, the last step of our command is to list the local URL, where we send the request.

Test request sent to the server using cURL. Note the prediction delivered, shown at the bottom.

Step 4: Deploying our web service with Heroku

Up until now, our web service only runs locally. The server just runs through a local port. The final step in our deployment process is to make change the URL where the requests are sent. Instead of being a local URL, it will now be one accessible to anyone on the world wide web.

Nowadays, we are lucky to have a multitude of free services that we can use to complete this process and deploy our web service to a publicly-accessible URL. These include Python Anywhere, and Heroku, among others.

When we deploy to one of these cloud-based hosting services, we move out code off of our local machine. This means it is no longer running in our local Python environment, with the packages that we have installed locally. In order for our prediction generator to run in its new home in the cloud, we need to make sure that the prediction generation script has access to all the same Python packages it used locally, like scikit-learn, Flask, and pandas. To do this, we list all of these packages and the correct versions of them needed to run the code in a requirements.txt file.

Once we have this file, we can easily deploy our prediction generator by simply following the Heroku instructions for Python.

Summary

Following these steps, we have freed our model from the confines of our local machine. It now lives in the cloud, where it can make predictions on new data points representing new customers. Let’s review the steps taken, and what changed in each step in our deployment process as we moved to full deployment.

Note that everything in steps 1–3 happens in our local environment. It’s only in the final step that our predictor moves from our local machine to really being accessible by others.

Once this happens in the final step, other applications can also interact with our predictor. It’s not just individuals that can send requests as JSONs. Rather, other applications can automatically send requests. The predictions that are returned by our prediction web service can also be consumed by other applications. For example, they can be saved in a database, or used to send an automatic notification. Such a notification might be an automated email sent to the customer service department if the model predicts that a customer will churn, prompting them to call the customer and offer them a new discount.

Once we have freed our prediction model from the confines of our machine, the possibilities for using the predictions are truly endless. Knowing how to deploy your machine learning models will help you demonstrate their value to the business and make sure that they have an impact.