Automating the training of ML models with Google Cloud AI Platform

How we handled the training and the deployment of our FastAI models using AI Platform — Part I

Sacha Lasry
Artefact Engineering and Data Science
8 min readMar 17, 2021

--

TL;DR

Training a ML model can sometimes be complicated to set-up and replicate:

  • It might be done using some code hosted on a notebook on a VM that you have to launch manually and turn off when it’s finished
  • You may have to upload a training dataset each time you want to train it again
  • You need to deep dive into your code when you want to change a single parameter
  • etc.

In this article, we’ll see how we automated the training process of FastAI’s text classifiers, using Google Cloud AI Platform.

In a second article, we’ll see how we managed to deploy such models with AI Platform and TorchServe.

For who?

If you’re working on a project that requires to train ML models multiple times, and you’re tired of having to manually run your trainings, you’ve come to the right place.

If you’re tired of managing VMs for your training and just want your time to be allowed to something more interesting, like reading Medium articles, you’ve also come to the right place!

This article is dedicated to those who want to know how they can gain time and resources by using AI Platform for the training of their ML models. We’ll see in this article how we applied this to a project we worked on, using FastAI.

Pre-requisites if you want to reproduce what we did

AI Platform is part of the Google Cloud Platform suite, as well as the other services we used to automate our training pipeline. Here are the GCP services we used:

  • AI Platform, to host the training of the model
  • Cloud Storage, to host the files that are needed for the training along with the model file that will be exported after the training
  • Cloud Registry, to host the Docker Image containing the training code
  • (Optional) Compute Engine, to build and run the…

--

--