Pipelines for production ML systems

Over the last year, my team and I created a whole bunch of ML pipelines for the Cloud. I am writing to this article to show you how easy it is to create your own pipelines for production scale ML systems. You’ll see how to train a model using open-source components, deploy a model for serving, and create a pipeline for this workflow.

Ivelin Angelov
6 min readJan 31, 2020

Intro

A typical ML process is training a model. The model development workflow iterates the steps of data preprocessing, training, and model evaluation.

Those steps are often executed manually by running scripts, notebooks, and applying human judgment to validate the model as successfully trained.

When this model training is done for production, the process of training an ML model is more likely to be automated with an ML model training pipeline.

This way, the model training can be reviewed, reproduced, tuned, and debugged at any point. When triggered, the components in the above pipeline will run in sequential order, and only if they are all successful, a micro-service that serves the trained model will be created or updated.

This article will show you how to:

  • Use open-source components locally or on the Cloud
  • Deploy your trained model for serving on the Cloud
  • Create ML pipelines and run them on the Cloud

Open-source components

Most of the components in the above pipeline can be reused and have open-source implementations. There are all types of data preprocessing, model training, and validation components hosted on the AI Hub. You can go very far into production ML training and serving without the need for writing custom components.

Before we start using any components, let go through the basics. All of the components are applications in Docker containers. If you are unfamiliar with Docker, I suggest you scroll through a Docker tutorial for beginners. All components take runtime arguments and produce output files. For example, to train an ML model, we would need the Docker registry URI that points to the container and pass arguments as training-data, learning-rate, and output-location. When we run that trainer image, the application will read the runtime arguments and use them to produce a trained model.

Run ML containers locally

To run any component locally, you need to have Docker installed. The shell snippet below will run a local XGBoost model training. Give it a try. The container will read public Iris dataset files stored on Google Cloud Storage and will create a trained XGBoost classification model on your local drive.

The output of the component contains a trained model and a Run Report HTML file. Open the report with a web browser to identify if the training was successful and to see other useful model-related information. Click on this link to see a sample of a Run Report.

To run that XGBoost container, we use a registry URI that you can get from the container documentation page. Every published component on AI Hub has documented runtime arguments and a docker registry that can be found in the pipeline.tar.gz file or in AI Platform snippets (click “Download” or “Edit Training Command” buttons). Unless the component requires special hardware, you should be able to test them locally with Docker.

Featured ML containers

Disclaimer, my team and I published those and many others like them on AI Hub.

Try those ML containers locally. All you need is to do is modify the XGBoost snippet to use a different registry URIs and arguments. Note that you can use your datasets or get an example dataset from the ML container’s documentation page.

What we did so far is read datasets from the Cloud but do the computation locally.

Run ML containers on the Cloud

There are many solutions for running docker containers on the Cloud. AI Platform training is a specialized service for running ML training containers.

Note that the first portion of arguments is AI Platform specific.

AI Platform training works better than other solutions because of the easy distributed training setup and the provision of TPUs. The XGBoost container in the example doesn’t support TPU, but it does support GPU and multi-worker training.

Deploy a trained ML model for serving

To serve inference from your model, you need to create a serving REST endpoint. That is easy if you let AI Platform serving host your model.

AI Platform serving creates a REST API that returns predictions and scales automatically according to demand. This eliminates the need for setup and maintenance of complex infrastructure. AI Platform also simplifies the process of updating prediction models by providing a model name and version.

We can use the Deployer component to deploy the trained XGBoost model on the AI Platform. Note that the process of training and deployment will be the same for all TensorFlow components as well.

After running the deployment snippet, go to AI Platform models to verify a model with that name and version is deployed for serving. By default, your model is auto-scaled and can handle production traffic for online or batch predictions.

So far, we trained a model, and we deployed it for serving on the Cloud. Next, we will see how to connect those steps in a pipeline.

Orchestration

The orchestration is the final stage of any pipeline development, and you should be able to execute every component separately before you put them in a pipeline. So how do we chain our Docker ML containers?

Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers.

To create a pipeline, we need to describe the DAG of components with the Kubeflow Pipelines Python SDK. KFP pipelines are composed of components, and components are Docker containers with the addition to some metadata as inputs, outputs, and Docker registry. That metadata is normally stored in a YAMAL file.

ML Container + component.yaml = Component

Most ML Containers are released with such a metadata file, and they are published under the “Kubeflow pipeline” category. Some ML containers don’t have a component.yaml file. They are just Docker containers that parse command line arguments. Those are published under the “ML Container” category.

Note how the Deployer component can load its component.yaml straight from the GitHub repo. The XGBoost container doesn’t have such a definition, so we have to create it programmatically by calling the ContainerOp method. The alternative to that will be to put the metadata in a YAML file. I hope this shows how easy it is to write your own components.

This Python snippet will create a pipeline.tar.gz file containing a definition of the DAG of components. To be able to run the pipeline, you first need to set up a Kubeflow cluster, then upload and run the pipeline.

I enjoyed writing this article. Do you want to see another one describing the process of creating your own components, creating pipelines from Jupyter notebooks, component hyperparameter optimization, or CI/CD for ML containers?

--

--