How to Deploy Machine Learning Models using Flask, Docker and Google Cloud Platform(GCP)

Machine Learning has become hugely popular lately and as a result there are a lot of online materials that teach how to do machine learning, but one thing I realized after going through a number of these courses is that, they all leave you at the point where you have some machine learning model sitting on your local machine. If you are reading this article, then I’m guessing you are facing a similar situation. So how do you get your models into production? How do you get your intended audience to make predictions out of the model you have built? If you are looking for the answer to any of the questions above, then you are in the right place. Keep Reading!

What does production mean?

Production means getting your application used by its intended audience in a real world situation. This is why it is important for us to learn how to deploy our models. What good is it to have a very good model hiding somewhere on your laptop. Our work is only useful if whoever we built it for, whether a business or the general public is getting value from it.

What are the requirements for a model in production?

Now that we understand what production means, let’s go through some of the requirements for having a model in production. I’m going to discuss five of them.

  1. Your model must be Easily Accessible. Python has almost become the de facto programming language for doing machine learning and in this article we will be using python to write our deployment code. When we say our model should be easily accessible, we mean, it shouldn’t matter what language the application going to use our model is built in, it should be able to make predictions from our model easily. Whether your front-end or user facing application is a mobile app built with Java or a web app built with PHP/Javascript, it should still be able to make predictions from your model.
  2. High Performance. This simply means that our model should be able to handle requests and return responses as quickly as possible. If we have actual users interacting with our application, we don’t want them to wait too long to get a prediction from our model.
  3. Fault Tolerance. Being able to withstand errors should be a major attribute of our model application. In the event that some part of application experiences an error, we don’t want the whole system going doing as a result.
  4. Scalablility. Scalability is a very important topic in software architecture and it is also necessary for us to ensure scalability when deploying machine learning models. What this means is, our model should be able to maintain its high performance even as the number of requests increase. For example, if it take less than a second to response to a thousand requests at a time, our model application should also be able to perform at a similar rate when we are getting a million requests at a time.
  5. Maintainability. This describes how easily it is to maintain our model in production. If there’s a new version of our model, how easily can we replace the older version without any unnecessary down times.

How do we achieve this?

To the part you’ve been waiting for, how do we get our model in the hands of our users and also satisfy all the requirements we’ve listed above. Over here, we are assuming that you’ve already built and saved your model onto your local machine. We can deploy our model by following the steps below;

  1. Wrap your model with a restful web service using Flask. This makes it easily accessible.
  2. Containerize your web service with docker. This ensures fault tolerance by allowing us to spin up new containers very quickly should one break down.
  3. Deploy your model into a kubernetes cluster to ensure scalability.
  4. Save your model in a google cloud storage bucket so you can easily replace older models with newer versions.

Let’s take it one after the other.

Web Service with Flask

Flask is a microframework for python web development. It is quite lightweight and allows you to use only what you need for your application. We’ll be using flask to built our restful web service. Below is a simple flask application that loads our model into memory, makes predictions with our model and returns predictions a JSON object.

from flask import Flask, request, jsnofiy
import utils
app = Flask(__name__)
model = utils.load_model("<path_to_model>")
@app.route('/predict', methods=['POST'])
def predict():
if request.method == 'POST':
data =['input']

model_input = utils.preprocessing(data)
        results = model.predict(model_input)
    return jsonify({"prediction": results})

In the sample code above, we are creating flask app instance, using our load_model function defined in our utils module to load our model into memory. Notice that we do not load our model inside the request handler. This will help reduce latency since we don’t have to load our model every time we get a request. Also note that flask’s inbuilt server is not suitable for production as it doesn’t scale well. You’ll need to use a WSGI server to run your flask application. eg. gunicorn, uwsgi.

Containerizing web service with Docker

Docker is an application that allows you to package your application in a container with all your dependencies so you can run them anywhere. The advantage of using docker in this process is to ensure that our application works the way we want it to work, ensure scalability by starting up new containers as we get more requests, and maintaining fault tolerance by being able to replace faulty containers with new ones in no time.

You can dockerize your application by adding a Dockerfile to your app folder. Conventionally it is placed in your app’s parent directory, but you can place it anywhere in your app. The contents of a Dockerfile can as show in the code below.

# Pull Base Image
FROM ubuntu:16.04
# copy code into image and set as working directory
COPY . /application
WORKDIR /application
# install dependencies
RUN sudo apt-get -y update && \
pip install pipenv && \
pipenv install --system --deploy
ENTRYPONIT ["gunicorn"]
CMD ["server:app"]

In the Dockerfile above, we are using ubuntu 16.04 as our base image, then we copy the contents of our current directory into the “/application” directory in the image. If the location doesn’t exist, it will be created automatically. Then, we set the “/application” directory as our working directory. Meaning, all the commands we run in our image will be run from that location. We go ahead to use the RUN command to install our dependencies and then we use gunicorn to start our app inside a module called Once our Dockerfile is ready, we build our image by using the following command

docker build -t <image_name> <path_to_dockerfile>

Deploy using Kubernetes

After building our image, it’s now time for us to deploy our image. We are going to use a service on Google Cloud Platform that allows you to deploy and manager containers(running docker images) called Kubernetes. It is a really powerful tool that automates certain aspects of our requirements like scaling and maintaining fault tolerance. To deploy our image using kubernetes, we’ll use the following steps,

  1. Push docker image to google container registry
gcloud docker --push<your-project-id>/<image-name>

2. Create a container cluster on google cloud platform

gcloud container clusters create <cluster-name> --num-nodes=3

3. Run app image inside the cluster we just created

kubectl run <deployment_name> --image=<your_image_in_container_registry> --port 8080

4. Expose application to the internet

kubectl expose <deployment_name> --type=LoadBalancer --port 80 --target-port 8080

At this point you’ve been able to build your model, wrap a web service around it, containerized it and deployed it with kubernetes. You should be clapping for yourself.


Following these steps can help you get your model into production, yes, but in this article I assumed you have already installed docker, gcloud and kubernetes on your machine. If you didn’t have any or one of these and you’re wondering how to do that, don’t worry I got you covered. Use the links below for more information on


Google Cloud SDK

This article helped me a lot with using kubernetes,

or you can use kubernetes documentaion

Thank you very much for reading up to this point. If you like it please share it and don’t forget to leave a comment or feedback.