Serverless machine learning using Docker

Running containers in Google AI Platform

Matias Aravena Gamboa
spikelab
Published in
4 min readFeb 26, 2020

--

For this post, I assume that the reader knows what Docker is. If you don’t know anything about Docker you can start here

Why use Docker in Data Science

Docker is a great tool for building, shipping and running containers. It allows to run applications in different infrastructure (for example AWS, GCP, K8, etc) without taking too much effort. You can run and test your applications locally and then deploy and scale them with ease.

Usually, in a Data Science project a general workflow is followed: extract the data and explore it, and then train and test multiple models until you get the expected results. After that, it’s a good idea to package the model training process so that you can easily retrain the model or scale the process up.

Here is where we can use Docker to make this process simpler. The main idea of this post is to show how to package a model’s training process using containers and run them in Google Cloud Platform.

Prepare the data

At Spike, we usually feed our model’s training processes with data that is hosted in Google BigQuery, which is a powerful data warehouse that allows to manipulate very large amounts of data quickly and easily.

For this post, I uploaded the wine quality dataset into BigQuery. This dataset includes a series of features that can (hopefully) determine the quality of the wine.

Our goal will be to train a model that predicts the wine quality. For practical reasons we won’t focus on the quality of the model in this post, so I’ll just skip this evaluation.The dataset in BigQuery looks as follows:

Wine quality dataset.

Create a service account

The next step is to create a service account file and assign the required roles to read BigQuery tables. In production, this allows to restrict the access of the service account to specific datasets contained in the BigQuery project. For each service account, a json file is created. We then need to download this file.

The previous step is needed uniquely in the case when you extract data from BigQuery. If you use a different DataWarehouse, you will need to set up the required connection to load your data.

Training our model

For this example, our training script will load the data from BigQuery, then train a GradientBoostingRegressor using parameters defined by the user, and finally it will log some model metrics.

In a real ML problem we can also run a GridSearch for parameter tuning, export the model into Google Cloud Storage, etc.

Building the Docker image

Let’s start by adding a requirements.txt file with the required dependencies.

Now let’s add a Dockerfile:

The file structure should look like this:

Now we can build our docker image running as follows:

After a few minutes, the docker image will be created and if everything is ok, a message like this will be prompted:

Now we can run it locally:

And the output should look like this:

Running the container in Google Cloud Platform

We are going to train our model in Google AI Platform, which allows us to train models without worrying about managing servers or clusters. We only need to define the machine type, and send a job into the platform with our training code, and Google does the rest…Perfect!.

Push our image into Google Container Registry.

Now we can submit a training job into AI Platform with the following command.

The model’s logs/outputs can be found in StackDriver Logging, or alternatively we can stream the logs into the console:

Conclusions

Training ML models using Docker allows to scale them easily, and makes our training scripts portable. We can run them locally or in AWS, Azure, GCP, Kubernetes, etc.

In Google Cloud Platform you can easily submit a training job and switch between different machine types.The kind of machine depends of your problem and the size of your data. Additionally, you can run your training script using a GPU or TPU.

--

--

Matias Aravena Gamboa
spikelab

Machine Learning Engineer and beer enthusiast. Valdivia, Chile