An Introduction to Vertex AI: Google Cloud AI/ML Platform

Dhananjay Kimothi, PhD @ QUT/IIITD
CloudTopics
Published in
7 min readApr 30, 2023

TL;DR: This article discusses Google Cloud’s unified AI/ML platform called Vertex AI and offers an introduction to its services. Aim of this article is to introduce Vertex AI and its features.

Let’s dive in…

What is Vertex AI ?

Vertex AI is a Google Cloud offering that helps you complete all your ML-related work, It’s an unified AI/ML platform that allows you to track, train and deploy ML models. You can either use Google’s AutoML service to get you best model for your use case or you may write your own code and deploy a custom model. Currently, its AutoML service supports — image, tabular, text and video data. For image data you can use it if your use case matches classification, object detection or segmentation problem; for Tabular data it supports Regression and Classification so you can use it for predicting the house prices or fraud detection based on input features, such as amount, location, and type of a financial transaction. For Text, problems like Text Classification (Single and Multi Label), Entity Extraction and Sentiment Analysis is supported. Lastly, if you are working on Video and building an application to classify the video/frames, recognize action in the video or track an object in the video, you can achieve it with no code knowledge and in few clicks using Vertex AI’s AutoML service. Alternatively, you can also write your own code, train and deploy custom models. Apart from building models and deploying them at end points, Vertex AI also facilitate to store metadata and artifacts, do experiments, and run the batch predictions.

In summary, it provides everything you need to build, operationalize and manage any ML workflow.

Vertex AI Services

As you can observe in the above image, the features/services available in the Vertex AI are put into four buckets — TOOLS, DATA, MODEL DEVELOPMENT and DEPLOY AND USE.

Lets explore each of them in detail:

Services in Vertex AI ?

Vertex AI offers different service and categorizes them as TOOLS, DATA , MODEL DEVELOPMENT and DEPLOY AND USE.

  1. Tools

In this category Vertex AI offers Dashboard, Workbench and Pipelines services.

  • Dashboard — It provides summary of data, models and prediction. You can start your ML journey directly from here as well. The sections within the dashboard are intuitive.
Dashboard screenshot taken from Google Cloud Consol
  • Workbench

It is a Jupyter Notebook-based development environment that allows you to interact with Google Cloud services such as Google Cloud Storage (for unstructured data) or BigQuery (for structured data) from within a Jupyter Notebook in any workbench instance.

It offers two options -

a. Managed notebooks — These are Google-managed environments that offer integrations and features to make it easy to work in an end-to-end, notebook-based production environment.

When creating managed notebooks, you have the option to configure the hardware, which includes selecting machine types such as standard, high in memory, or high in processing power, as well as options for GPU and data disk type. Additionally, there are other advanced options available related to disk encryption, networking, and security.

Machine Types:

Machine Types

GPU Types:

Available GPU types

Data Disk Types:

Available Disk Types

b. User-managed notebooks — User managed notebooks are customizable and offers much more control to the user. You can choose the configuration based on your requirements.

User managed notebook customization options
  • Pipelines
    Using this tool in Vertex AI, you can orchestrate your ML workflow. Vertex AI Pipelines support ML workflows built using the Kubeflow Pipelines SDK or TensorFlow Extended. You can upload the pipeline, use a pipeline saved in a repository, or create a new run. Currently, some templates (as shown below) are available, which you can use as a reference and modify as required.
Screenshot taken from Google Cloud Consol

The second bucket of services available in Vertex AI are related to DATA:

2. Data

In this set of services, we have Feature Store, Datasets and Labeling tasks.

  • Feature Store — It is a centralized repository for organizing, storing and serve ML features (input consumed by ML models). It is a central place which can be accessed by data engineers, data scientists and ML engineers in your team, thereby avoiding need to re-engineer features by different groups or in different ML projects. It handles both batch and online feature serving, and you can monitor feature drift and can lookup latest features.
  • Datasets — This service allows you to create managed datasets that are used mainly for training AutoML models and is optional for custom training.
Different Data Types supported
  • Labelling Task — Often, the data available to us is not labeled or sparsely labeled. To train and evaluate our ML models, we need a well-labeled dataset, and creating it manually becomes necessary. The Labelling service offered in Vertex AI facilitates the user to create a labeled dataset. You can either use your own group of labelers, and if you do not have expert labelers in house or the data is too large, you can outsource the task and employ Google-managed labelers.

Third category of services offered by Vertex AI concerns with Model development life cycle.

3. Model Development

In this category, Vertex AI offers Training, Experiments and Metadata services.

  • Training — This service enables creation of training workflows, allowing users to train AutoML models for tabular, image, text or video data without the need for coding. Alternatively, user can opt for custom training, which provides greater control over ML framework options and hyperparameter tuning.
  • Experiments — Iterating through the model(s) and tuning the hyperparameters is an integral part of the process of identifying the best ML model for any ML use case. The Experiments service of Vertex AI facilitates this process by allowing us to keep track of experiment run ( example, PipelineJob), inputs such as algorithms, parameters, datasets and outputs such as models, checkpoints and metrics. It lets you log parameters like batch size and layers, metrics such as summary metrics like accuracy or time series metrics such as loss over time (stored in tensor board), classification metrics such as confusion metrics, and artifacts such as dataset artifacts, model artifacts and generics artifacts associated with each experiment.
  • Metadata — This service provides capabilities for managing the life cycle of metadata consumed and produced by an ML workflow/system. It facilitates us to capture, store and manage metadata including data, models and pipelines, which further helps to analyze, debug, and audit the performance of an ML system. Vertex AI Metadata service is build on the concepts used in the open-source ML Metadata (MLMD) library that was developed by Google’s TensorFlow Extended team.

The last category of services enables users to manage and deploy the trained models to use.

4. Deploy and Use

The services included in this category are: Model registry, Endpoints, Batch Predictions and Matching Engine.

  • Model registry — This service facilitates users to manage the lifecycle of ML models. For example, let’s say you have two runs of the same model, Version 1 and Version 2. Version 1 is optimized for high precision, while Version 2 maintains high recall. With this service, you can evaluate, deploy and set up batch predictions for a model available in the Model registry from Model Version page. This allows you to deploy the required version as per your requirements. The service supports custom models, AutoML models, and even models trained with BigQuery ML.
  • Endpoints/Online prediction —This service enables the user to create endpoints for online prediction (near real-time) requests. User can configure the endpoints to set the traffic split, choose the access type (Standard or Private) and set the Encryption.
  • Batch Predictions — We do not always require immediate prediction results, in such cases, we can use the Batch Prediction Service. With this service, we can process large amounts of data and generate predictions in bulk. For batch predictions, we do not need to deploy the models, and the predictions can be obtained directly from model resources. The predictions can be saved to Google Cloud Storage or BigQuery for further analysis. We can also monitor the batch prediction jobs and view the job logs for troubleshooting.
  • Matching Engine : Similarity-based comparison integral to many ML systems such as search and recommendation systems. The Matching Engine Service facilitates similarity matching on Vectors. If your use case involves comparing data, such as image, text, or audio and computing their similarity, you can leverage the Matching Engine Service.

Note: Google Cloud frequently updates and add services. The Vertex AI services mentioned here are current as of today. Additionally, a new service related to Generative AI will soon be added to the Vertex AI. Stay tuned…

Follow me to get notifications for my newest articles. Your question and suggestions are highly appreciated and you can show your support by clapping! 👏 . Also feel free to reach out to me via Linkedin or write to me at dj.kimothi@gmail.com

--

--