Step-by-Step Guide to Creating and Deploying Custom ML Pipelines with GCP Vertex AI (Part 1)

10 min readMar 5, 2023

A machine learning (ML) model is of little value unless put into production where it can be used by consumers to solve problems. Therefore, like model building, model deployment is equally important. Vertex AI pipelines allow us to orchestrate our ML workflow in a serverless manner and to automate, monitor, and govern our ML systems. They also have the advantage of storing workflow’s artifacts using Vertex ML metadata. The very first step of orchestrating an ML workflow on Vertex AI pipeline is defining the workflow as an ML pipeline.

I have divided this article into 2 parts. In the first part, we will go through some key concepts and some prerequisites needed for creating custom Vertex AI pipelines. The second part would then deal with defining the workflow as an ML pipeline, followed by running this pipeline using Vertex AI.

— Key Concepts
-Pipeline Components
-Pipeline
-Inputs and Outputs
-Lineage of ML Artifacts
— Workflow
— Prerequisites
1-Activate Vertex AI on GCP Project
2-Building Custom Docker Training Image
3-Building Custom Serving Container Image
4-Create a Google Cloud Storage Bucket
5-Set Up a Virtual Environment

Key Concepts

Pipeline Components:

A pipeline component is a self-contained piece of code that will perform a single step in an ML workflow, for example data preprocessing, model training, model evaluation, and model deployment.

A component will consist of a set of inputs, outputs, and a container image; container image is where the component’s code runs — it includes the component’s executable code and an environment with all the packages needed for running the code.

Pipeline:

A pipeline consists of modular tasks — components — that are chained together via inputs and outputs.

Inputs to ML pipeline steps (component instances) can be set from pipeline’s inputs, or they can depend on outputs of previous steps in the pipeline. For example, if we have a pipeline with data ingestion, preprocessing, and training, data ingestion does not depend on any other parts so it will be the first step, followed by preprocessing which must occur after data ingestion and depends on its output, and finally training which will take as input the output of data preprocessing.

Each component in the pipeline is executed independently and the data in the shape of inputs and outputs is passed between components in a serialised format.

Inputs and Outputs:

We need to annotate component’s inputs and outputs with a data type which makes each input/output either a parameter or an artifact.

Parameter: Inputs or outputs of simple data type (float, int, str, dict, list, bool) — they are passed by value between components and stored in Vertex ML metadata service.

Artifact: Inputs or outputs that are either objects or files produced by the pipeline run e.g. datasets, models, metrics, and visualisations. Artifacts are defined by name, uri, and metadata (stored in Vertex ML metadata service). The actual content of artifacts is referred to a path in Cloud Storage bucket.

Pipeline architecture consisting of components which are stitched together via inputs and outputs, which would be either parameters or artifacts. Each component consists of inputs, outputs and a container where our code will be running.

Lineage of ML Artifacts:

To understand changes in the performance of ML workflow, we must be able to analyse the metadata of pipeline runs and lineage of ML artifacts. Artifact lineage refers to all factors that resulted in an artifact. A model’s lineage can include training data, hyperparameters, metadata such as model’s accuracy score, artifacts descending from the model such as batch predictions, etc. When a pipeline is run using Vertex AI Pipelines, the artifacts and metadata are stored as Vertex ML Metadata. We can then use this metadata to answer questions such as which pipeline run produced the most accurate model, what hyperparameters were used, etc.

Workflow

Building and running pipelines on Vertex AI essentially involves the following workflow:

Vertex AI Pipelines services can run pipelines built using Kubeflow Pipelines. Although Kubeflow itself allows us to define our workflow as a series of python functions that pass artifacts to one another, it runs ML workflow on a Kubernetes cluster. This makes its adoption difficult for a small company due to permissions and other problems associated with deployment. Since Vertex AI provides a managed pipeline runner, we simply need to define a pipeline and it will execute it, including provisioning of resources and passing of artifacts between steps.

That is enough theory, now let’s walk through a practical example. We will be using House Prices- Advanced Regression Techniques discussed on Kaggle as our use-case. You do not need to go too deep into the model building process, the aim here is to understand how we can deploy a machine learning model from scratch using end-to-end Vertex AI feature of GCP. But before we jump to our Jupyter Notebooks to define pipeline components, let’s deal with some of the prerequisites first.

Prerequisites

1- Activate Vertex AI on GCP Project:

You must have Vertex AI activated on your GCP project with proper authentication on GCP. Contact the engineering team in your company if they are the ones dealing with this, otherwise you can enable the APIs using

gcloud services enable compute.googleapis.com \
                       containerregistry.googleapis.com  \
                       aiplatform.googleapis.com  \
                       cloudbuild.googleapis.com \
                       cloudfunctions.googleapis.com

Now let’s deal with the topic of containers. You can have a separate container image for each one of your components to keep things lightweight, but for simplicity we will be considering here one container image for all our pipeline components. We will also be building a custom container image for serving model predictions.

2- Building Custom Docker Image for Training:

Docker, an open containerisation platform, allows us to package our application/code and its dependencies in a container. Each component will have its own associated container, which makes it possible to easily reuse these components from one environment to another.

Below are the steps you need to follow to create custom docker images and then push them to Artifact Registry (a single location for managing packages and Docker container images):

- Create a Dockerfile

FROM mirror.gcr.io/library/python:3.8
WORKDIR /
COPY requirements.txt /requirements.txt
COPY src /src
RUN pip install --upgrade pip && pip install -r requirements.txt
ENTRYPOINT [ "bash" ]

FROM defines the base image to be used by the container
WORKDIR specifies working directory of the container
COPY copies files from host to the docker container
RUN enables command execution in the container
ENTRYPOINT is the command to start the system

Dockerfile consists of instructions to build Docker images. In the above case, we are copying requirements.txt and src to our container and then using RUN to install the libraries specified in requirements.txt file.
Note requirements.txt file will contain all the packages needed to execute the code in your component and src folder contains your custom python modules which you would be calling within your component.

- Build the Image

#!/bin/bash     
PROJECT_ID=<PROJECT_NAME>
REGION="europe-west2"
REPOSITORY="houseprice"
IMAGE='training'
IMAGE_TAG='training:latest'

docker build -t $IMAGE .
docker tag $IMAGE $REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE_TAG

The script contains parameters for GCP environment and the command (docker build) to build the image. We are creating an image for the training step. You can use it to create images for any of your pipeline components by changing IMAGE and IMAGE_TAG variables, and using the corresponding requirements.txt and code files in your Dockerfile.

- Push the Image to Artifact Registry

The final step is pushing the image to Artifact Registry. First, we create a repository in the Artifact Registry; specifying the location and the repository format as docker. Then, we use gcloud auth configure-docker command to configure Docker to use gcloud command-line tool to authenticate with the registry. Finally, we push the image to Artifact Registry.

#!/bin/bash     
PROJECT_ID=<PROJECT_NAME>
REGION="europe-west2"
REPOSITORY="houseprice"
IMAGE_TAG='training:latest'

# Create repository in the artifact registry
gcloud beta artifacts repositories create $REPOSITORY \
  --repository-format=docker \
  --location=$REGION

# Configure Docker
gcloud auth configure-docker $REGION-docker.pkg.dev

# Push
docker push $REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE_TAG

Once the above steps are executed, you can go to Artifact Registry on GCP (type Artifact Registry in the search bar of GCP console) to check that your image is there. And if so, you can then go forward with using it as a base image in your components,

@component(
base_image="europe-west2-docker.pkg.dev/<PROJECT_NAME>/houseprice/training:latest"
)

3- Building Custom Serving Container Image:

Along with creating custom container images for our pipeline components, we also need to create a custom serving container image — a Docker container image for serving predictions and explanations from trained artifacts. Vertex AI provides many pre-built containers for prediction within which we can run our model. All we need to do is pick one of the default containers and specify the path to our saved model from the training step. For example, our model was trained using XGBoost version 1.6.2, we can pick the corresponding image from here.

Make sure you pick the image with the same version of XGBoost that you used to train your model.

This will make our serving container image uri

europe-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1–6:latest

However, in some cases,

- the pre-built containers might not support your ML framework e.g., PyTorch,
- or we might want more control over the prediction code,
- or we might need to do some pre-/postprocessing for our prediction.

For all such scenarios, it is best to create a custom serving container with the below requisites:

1- The custom container must provide an HTTP server that listens for requests on 0.0.0.0 on port 8080.

2- We also need to provide an HTTP path for health checks as Vertex AI would be checking from time to time the health of your server to make sure that it is able to handle prediction requests. The HTTP GET request is sent to a configurable health check path on the server and when it is ready to handle requests, it should return status code 200. Therefore, after you have loaded your model, make sure status code 200 is returned.

The default path for health check is / and for prediction /predict, all listening on port 8080. You can change these default values.

3- The third and final requirement is around the format of requests and responses. The prediction requests provided to HTTP need to be in JSON format with an instances field which is an array with your inputs. The HTTP server returns a JSON dictionary with prediction field containing an array of predictions generated by the container for the instances in the corresponding request.

Now considering the above 3 requirements, we can build our custom serving container. We can use Flask to implement our HTTP server and the code for that can be found here.

Once we have our server, we need to wrap it into a container image and for that we first create a Dockerfile and then repeat the steps above for building the image and then pushing it to Google Artifact Registry.

- Dockerfile

FROM mirror.gcr.io/library/python:3.8
RUN mkdir -p /app/model
WORKDIR /app

COPY requirements.txt /app/requirements.txt
COPY app.py /app/app.py
COPY predict.py /app/predict.py
COPY src /app/src

RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt

CMD exec gunicorn --bind :$AIP_HTTP_PORT --log-level info --workers 1 --timeout 90 app:app

The working directory would be the app folder. We need to copy requirements.txt into the app folder so that all the packages needed to run the application can be installed. We also need predict.py within our serving container. It would allow the deployed model API endpoint to handle JSON requests for predictions. We are also copying the src folder because within predict.py, we pickle load our model. Unpickling will try to import any class that it finds in the data. This means it will try to import our custom class using the same module import path (from src.modelling.train import HousePriceModel).

After creating your Dockerfile, you need to build your docker image and then push it Artifact Registry.

Once you have pushed your serving container image to Artifact Registry, head over to GCP, search for Artifact Registry, and then click on the repository name you mentioned while pushing the image to Artifact Registry. Here you should be able to see both your images, training and serving_image, as shown in the image below.

We now have a serving container image that launches an HTTP server supporting health checks and inference. The model deployed to Vertex AI endpoint will be wrapped within this serving container. We just need to specify the container image uri when uploading the model to Vertex AI model registry.

4- Create a Google Cloud Storage Bucket:

You also need to create a storage bucket on GCP. We will use this bucket name to define one of the variables in our pipeline later on. The storage bucket would be the place where your pipeline and artifacts will be stored.

5- Set Up a Virtual Environment:

On a Vertex AI Workbench (Jupyter Notebook based development environment), open a terminal. Then create and activate a virtual environment, which can be done using

virtualenv vertex_env
source vertex_env/bin/activate

Install the following packages in your environment:

pip install google-cloud-aiplatform
pip install kfp

Now you can go to your Jupyter Notebook and select your virtual environment as your kernel. You can add your virtual environment to your Jupyter Notebook using ipykernel.

In part 2, we define our pipeline components and stitch them together to create an end-to-end custom Vertex AI pipeline with the model being model deployed to a Vertex AI Endpoint, which we can then call to make predictions.

Hi 👋 if you found this article useful, please support by buying me a coffee here. Thank you 😀