*image source:* *https://cloud.google.com/vertex-ai*

Getting Started With Google Cloud’s Vertex AI

8 min readJun 29, 2023

Introduction

I’ve been spending a lot of time reading documentation and playing around with vertexAI for work, and I wanted to capture my learnings here so I can refer to them, but also so others can learn too :). This document primarily covers a topical high-level steps for building a small model.py file, and pushing it through the various steps for getting into production. I am still learning about this specific technology, and so this page will probably see some updates through my work.

Step 1: Sample Model Code

We’ll use a convolutional neural network (CNN) for this task. Don’t worry if you’re new to CNNs; this example will guide you through the process. Let’s take a look at the code:

import torch
import torch.nn as nn
import torch.optim as optim
# Defining the Model Architecture
class CNN(nn.Module):
 def __init__(self):
 super(CNN, self).__init__()
 self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
 self.relu = nn.ReLU()
 self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)
 self.flatten = nn.Flatten()
 self.fc = nn.Linear(32 * 32 * 32, 10)
 self.softmax = nn.Softmax(dim=1)
 
 def forward(self, x):
 x = self.conv1(x)
 x = self.relu(x)
 x = self.maxpool(x)
 x = self.flatten(x)
 x = self.fc(x)
 x = self.softmax(x)
 return x
# Creating an Instance of the Model
model = CNN()
# Defining the Loss Function and Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Training the Model
for epoch in range(10):
 running_loss = 0.0
 for i, data in enumerate(trainloader, 0):
 inputs, labels = data
 
 optimizer.zero_grad()
 
 outputs = model(inputs)
 loss = criterion(outputs, labels)
 loss.backward()
 optimizer.step()
 
 running_loss += loss.item()
 
 if i % 100 == 99:
 print(f'Epoch: {epoch+1}, Batch: {i+1}, Loss: {running_loss/100:.3f}')
 running_loss = 0.0

Step 2: Saving Model Code

It’s crucial to save your model code in a Python file for easy access and future use. Let’s name our file `model.py`. This file will be utilized for training in Vertex AI.

Step 3: Code Management Options

When it comes to managing your code in Vertex AI, you have multiple options to choose from. Selecting the right option depends on your workflow and preferences. Let’s explore some of these options:

Uploading Code to Cloud Storage: If you prefer maintaining control over your code and managing it independently, uploading your code to a Cloud Storage bucket is an excellent choice. You can use tools like `gsutil` or any other preferred method to upload the `model.py` file to a bucket in your Google Cloud project. This approach provides you with full control over your code and ensures easy access and management.
Docker Image: Containerization with Docker brings tremendous benefits in terms of portability and consistency across different environments. By building a Docker image that includes your model code and its dependencies, you create a self-contained package ready for deployment. You can specify the dependencies in a Dockerfile, copy your code into the image, and build it. Once built, you can push the Docker image to a container registry like Google Container Registry (GCR). By referencing the Docker image directly in your Vertex AI training job or pipeline, you eliminate the need for separate code uploads.
GitHub Integration: For those who have their code stored in a GitHub repository, Vertex AI offers seamless integration with GitHub. By linking your GitHub repository to your Vertex AI project, you can leverage the built-in functionality to automatically detect changes in your repository. You can configure triggers to start training jobs or pipeline runs whenever new changes are pushed to the repository. This integration simplifies your workflow by directly utilizing the code from GitHub, minimizing the manual uploading and synchronization efforts.
Vertex AI Managed Notebooks: If you prefer an interactive development environment, Vertex AI Managed Notebooks can be your go-to solution. Managed Notebooks provide JupyterLab or Jupyter Notebook instances fully integrated with Vertex AI. By creating a notebook instance, cloning your GitHub repository directly into it, and interactively developing your model code, you get a seamless and intuitive experience. Once you’re ready, you can submit training jobs or build pipelines using the code directly from the notebook instance.

Choosing the appropriate option based on your code management strategy and development workflow is key to a smooth and efficient machine learning experience.

Step 4: Writing Vertex AI Training Code

Now, let’s move on to writing the Vertex AI training code. We’ll create a new Python file called `vertex_training.py` and include the following code:

from google.cloud import aiplatform
project_id = 'your-project-id'
location = 'us-central1'
bucket = 'gs://your-bucket'
# Option 1: Using a Dockerized Model
job_spec_docker = {
 "display_name": "training-job-docker",
 "model_serving_container_spec": {
 "image_uri": "gcr.io/your-project-id/your-docker-image:latest",
 "args": ["python", "model.py"],
 },
 "input_data_config": {
 "fraction_split": {
 "training_fraction": 0.8,
 "validation_fraction": 0.1,
 "test_fraction": 0.1,
 },
 "dataset_id": "your-dataset-id",
 "gcs_destination": {
 "output_uri_prefix": f"{bucket}/output"
 },
 },
}
# Option 2: Using GitHub-Linked Model Code
job_spec_github = {
 "display_name": "training-job-github",
 "model_serving_container_spec": {
 "image_uri": "gcr.io/cloud-aiplatform/training/tf-cpu.2–4:latest",
 "command": ["python", "-m", "pip", "install", "-r", "requirements.txt"],
 "args": ["python", "your-github-repo/model.py"],
 "environment_variables": {
 "GITHUB_TOKEN": "your-github-token"
 },
 },
 "input_data_config": {
 "fraction_split": {
 "training_fraction": 0.8,
 "validation_fraction": 0.1,
 "test_fraction": 0.1,
 },
 "dataset_id": "your-dataset-id",
 "gcs_destination": {
 "output_uri_prefix": f"{bucket}/output"
 },
 },
}
parent = f"projects/{project_id}/locations/{location}"
# Option 1: Creating a Job Using a Dockerized Model
aiplatform.gapic.JobServiceClient().create_custom_job(parent=parent, custom_job=job_spec_docker)
# Option 2: Creating a Job Using GitHub-Linked Model Code
aiplatform.gapic.JobServiceClient().create_custom_job(parent=parent, custom_job=job_spec_github)

In the provided code, we offer two options for specifying the model code:

Option 1 (Dockerized Model): By providing the Docker image URI (`image_uri`) that contains your model code and dependencies, you create a self-contained environment. Before referencing it in the job specification, ensure that you have built

and pushed the Docker image to a container registry such as GCR.

Option 2 (GitHub-Linked Model Code): If your code resides in a GitHub repository, you can directly reference it. Specify the path to the Python file (`model.py`) within your GitHub repository. Additionally, you can set environment variables like the GitHub token if needed.

Once the job is created, you can submit it to run the training job based on the defined pipeline.

Step 5: Updating Python Code for Pipeline Components

If you intend to utilize the model in a Vertex AI pipeline, you need to update your Python code to align with the pipeline components. You can leverage the Vertex AI SDK to write your pipeline code efficiently and seamlessly integrate with the pipeline components.

Step 6: Compiling the Python Code

To compile the Python code for pipeline components, let’s take advantage of the Vertex AI SDK’s powerful `compiler.compile()` method. This step enables you to transform your code into an executable form that Vertex AI can understand and execute. Here’s an example:

from kfp.v2 import compiler
pipeline_package_path = 'pipeline.tar.gz'
compiler.Compiler().compile(pipeline_func, pipeline_package_path)

Step 7: Retrieving JSON Output from Compiling

After compiling the pipeline code, you’ll obtain a JSON description of your pipeline. You can choose to save this JSON to a file or directly utilize it in the subsequent steps.

Step 8: Creating a Job in Vertex AI

Using the Vertex AI Python SDK, it’s time to create a job for your batch prediction model. Let’s illustrate the process with an example:

from google.cloud import aiplatform
project_id = 'your-project-id'
location = 'us-central1'
pipeline_json_path = 'pipeline.json'
pipeline_job = {
 'display_name': 'pipeline-job',
 'pipeline_spec': {
 'pipeline_info': {
 'pipeline_id': 'your-pipeline-id',
 },
 'pipeline_run_spec': {
 'pipeline_spec': {
 'artifact_code': {'container_spec': {'image_uri': 'gcr.io/my-image:latest'}},
 'command': ['python', 'model.py'],
 'args': [' - input', 'input.txt', ' - output', 'output.txt'],
 },
 'executor_spec': {
 'container_executor_spec': {
 'image_uri': 'gcr.io/my-image:latest',
 }
 }
 },
 },
}
parent = f'projects/{project_id}/locations/{location}'
aiplatform.gapic.JobServiceClient().create_batch_prediction_job(parent=parent, batch_prediction_job=pipeline_job)

In the provided example, remember to replace ‘your-project-id’, ‘pipeline.json’, and other placeholders with the appropriate values based on your project configuration.

Step 9: Scheduling the Job in Vertex AI

If you aim to schedule the job to run at specific intervals, Vertex AI offers you the flexibility to do so. By utilizing the Vertex AI Python SDK, you can easily create a schedule for the job. Let’s see an example:

from google.cloud import aiplatform
project_id = 'your-project-id'
location = 'us-central1'
job_id = 'your-job-id'
schedule = '* * * * *' # Cron expression for scheduling
parent = f'projects/{project_id}/locations/{location}'
aiplatform.gapic.JobServiceClient().create_batch_prediction_job(parent=parent, batch_prediction_job=pipeline_job, schedule=schedule)

Ensure to replace ‘your-project-id’, ‘your-job-id’, and ‘schedule’ with appropriate values according to your scheduling requirements.

Setting Up Different Pipelines

In Vertex AI, the power of pipelines lies in their ability to cater to specific tasks within your machine learning workflow. By setting up different pipelines, you can modularize and streamline your workflow, making it more manageable and efficient. Each pipeline can focus on a specific task, allowing you to update and iterate on individual components without disrupting other parts of the workflow.

Consider the following steps when setting up different pipelines in Vertex AI:

Identify Pipeline Objectives: Start by determining the specific objectives or tasks you want to achieve in your machine learning workflow. For instance, you might want a pipeline for data preprocessing and creating a new training set, another pipeline for training models on the updated dataset, and a separate pipeline for making predictions using the trained models.
Define Pipeline Components: For each pipeline, define the specific components or steps required to accomplish the objectives. Each component represents a specific task or operation in the pipeline, such as data preprocessing, model training, or batch predictions.
Configure Input and Output Connections: Configure the input and output connections for each pipeline component. Inputs could include data files, datasets, or other artifacts required for the component’s operation, while outputs could consist of trained models, evaluation metrics, or generated predictions. Establishing these connections ensures smooth data flow between pipeline components.
Build and Compile Pipelines: Leverage the power of the Vertex AI Python SDK or the Vertex Pipelines SDK (Kubeflow Pipelines) to build and compile your pipelines. Define the pipeline structure, connect the components, and specify the input and output connections using code. The compilation process transforms your pipeline code into an executable form that can be deployed and executed in Vertex AI.
Deploy and Run Pipelines: Deploy the compiled pipelines to Vertex AI and initiate their execution. Create pipeline runs or submit jobs to Vertex AI to kickstart the execution process. You have the flexibility to trigger pipeline runs manually, schedule them at specific intervals, or configure triggers based on specific events, such as new data availability.

By adopting a modularized approach with different pipelines, you gain flexibility, scalability, and maintainability in your machine learning workflow. Separating tasks and allocating resources accordingly becomes more manageable, enabling you to optimize your machine learning process efficiently.

Happy coding :D ,

Adam