Google Vertex AI Batch Predictions
Batch predictions allow you to predict large amounts of data in parallel.
Vertex AI Batch Prediction is made for large datasets that would take too much time with an online prediction approach. It provides a scalable, serverless, and efficient service for cases where you don’t need an immediate response (asynchronous).
Jump to the Notebook and Code
All the code for this article is ready to use in a Google Colab notebook. If you have questions, please reach out to me via LinkedIn or Twitter.
It all starts with a model
To run batch predictions, your model has to be uploaded to the Vertex AI Model Registry. Keep in mind there is no need to deploy it.
I am using the SDK as the recommended way through this article. You could also use the UI or the API.
We can use a pre-build container or customer container when uploading the model via Model.upload(...)
. If that’s new for you, head to my video and article on how to deploy your model to Vertex AI for online serving. (batch predictions and online predictions are very similar).