Serverless Machine Learning Pipelines with Vertex AI: An Introduction
Google finally did it and now the dream came true: we are now able to run serverless Kubeflow pipelines from the former AI Platform
There is this big problem for Data Scientists and ML Engineers that work at small companies (or small teams): a tremendous lack of human resources and time. Because of that, we have a very limited scope on the technologies we can use, or even test.
On the MLOps side, scope was a problem. There weren’t any managed MLOps pipelines solutions on GCP and the only way to adopt experiment pipelining as a practice was to deploy a full-fledged Kubeflow instance. We could rely on AI Platform Pipelines, with a semi-managed deployment, but there were still issues with GKE Cluster babysitting, which takes time (and is boring).
Vertex AI came from the skies to solve our MLOps problem with a managed — and reasonably priced—alternative. Vertex AI comes with all the AI Platform classic resources plus a ML metadata store, a fully managed feature store, and a fully managed Kubeflow Pipelines runner.
In this post, we will talk about ML Pipelines, Kubeflow Pipelines, how to create them to fill your custom ML needs, and how to run them in Vertex AI Pipelines and analyze their results, with a small example code.
If you are just here for the example code, here it is.
A quick recap on MLOps
As Vertex AI docs state:
Machine learning operations (MLOps) is the practice of applying DevOps strategies to machine learning (ML) systems. DevOps strategies let you efficiently build and release code changes, and monitor systems to ensure you meet your reliability goals.
MLOps extends this practice to help you reduce the amount of time that it takes to reliably go from data ingestion to deploying your model in production, in a way that lets you monitor and understand your ML system.
As a summary, MLOps is the practice that lets your “organize” your Data Science and ML area to create reliable systems, in order to fulfill the needs of your company.
As a consequence, to apply good MLOps practices, you need reliable, easy to use tools, so you can govern, monitor and execute all your data jobs, from ingestion to modelling to monitoring.
Pipelines, Kubeflow and Vertex
To apply MLOps, one of the most important steps is experimentation, so we need a robust experimentation tool. We want to be able to track our experiments, compare them, reproduce them and save all the results and data used.
Kubeflow Pipelines comes to solve this problem. KFP, for short, is a toolkit dedicated to run ML Workflows (as experiments for model training) on Kubernetes, and it does it in a very clever way:
Along with other ways, Kubeflow lets us define a workflow as a series of Python functions that pass results, and Artifacts for one another.
For each Python function, we can define dependencies (for libs used) and Kubeflow will create a container to run each function in an isolated way, and passing any wanted object to a next step on the workflow. We can set needed resources, (as memory or GPUs) and it will provision them for our workflow step. It feels like magic.
Once you’ve ran your pipeline, you will be able to see it in a nice UI, like this:
The only problem is that Kubeflow Pipelines must be deployed on a Kubernetes Cluster. You will struggle with permissions, VPC and lots of problems to deploy and use it if you are in a small company that uses sensitive data, which makes it a bit difficult to be adopted
Vertex AI solves this problem with a managed pipeline runner: you can define a Pipeline and it will executed it, being responsible to provision all resources, store all the artifacts you want and pass them through each of the wanted steps.
We will now see how we can define a Kubeflow Pipeline and run it in Vertex AI, using sklearn breast_cancer_dataset.
Defining a Pipeline: a customized example with a toy dataset
We are now defining a Pipeline with 3 simple steps, known in KFP as…
- Ingest data and separate train and test splits.
- Train out model, using the train split from step 1.
- Evaluate our model for ROC Curve and Confusion Matrix, using the model from step 2 and the test split from step 1.
To start, lets install the Kubeflow Pipelines SDK:
pip3 install kfp --upgrade
You can see the full Jupyter Notebook in the Github repo for this post. We will go through each code chunk here here to make it more understandable. To start, some imports:
We are importing from kfp.v2
because it is the new Kubeflow Pipelines SDK version, which is compatible with Vertex AI. We import dsl
which stands for “Domain-specific language”, as it is the main module for the SDK for pipeline definition.
We import Artifact, Dataset, Input, Model, Output, Metrics and ClassificationMetrics from kfp.v2.dsl
because they are how we pass objects between components. When we define a component, we state can state on the argument type hinting the inputs and outputs of our component.
In that way, it creates and object with attributes “path” (for storing and reusing objects between components), and “metadata” (for storing object metadata), and also some type-specific methods, as the ClassificationMetrics method to create plot a beautiful ROC Curve on the UI.
Let’s see how it works on our first example, the get_data
operator:
Notice that we did not return any value, but we defined arguments for the component as dataset_train: Output[Data]
, for instance. It means that, on the function we can access an Output object (that will be created within the use of the function)of the class Dataset, with path and metadata attributes.
With that, you can save the dataset on specific paths and access it on your next operator if you need. After you call the object component, you can access its outputs
attribute. For example, if you want to access the dataset_train
object, you can do it with: ds_train = get_data().outputs["dataset_train"]
when you are defining the full pipeline.
We also used the @component
decorator, where we can define the needed packages for the creation of a container capable of running our function.
For the train step, we will use one of our output objects, and also access an object previously created. Let’s see how that’s done:
All we have to do is define dataset: Input[Dataset]
and, if we pass get_data().outputs["dataset_train"]
on when calling the component, it will access the dataset_train
object, and download it by using its path
attribute.
Notice that we’ve also defined some metadata for our model: train_score
for the score of the model in train time, and its framework
.
This logic continues on our last component: the model evaluation on the train dataset:
What we did new here is to pass Output Metrics and Classification Metrics objects. That will let us save the Model classification metrics and some beautiful interactive plots for ROC Curve and Confusion Matrix on the Vertex AI UI.
With everything done, we only have to create our pipeline and compile it into a json file:
Notice that we could access the outputs from previous steps. We compile it into a json file and then can submit it to run on Vertex AI. Be sure you have Vertex AI activated on your GCP Project, and that your are properly authenticated on GCP.
Checking on the UI
Congrats. Click the link on your jupyter and you will be directed to your pipeline run. You will see something like that:
You can also check you metric plots:
Beautiful, right?
Conclusion
That’s it. With Vertex AI, you will have an easier and better life with all things related to MLOps. There is still much more that can be done in Vertex Pipelines, like usinh TFx, or other Vertex AI tools, as the Feature Store. Everyone that suffered trying to set Feast knows what I’m talking about.
If you did like this post, start the repo on Github, ping me on LinkedIn or do whatever you feel like. If you have any feedback, correction for the post, or just want to talk, reach out.
I’m a Data Scientist @ Acordo Certo and a Google Developer Expert in Machine Learning. I’m an advocate for MLOps and fell in love with Vertex AI on first sight.
Here are some other links: