Google Developer Experts

Experts on various Google products talking tech.

Saving Time and Money using Gemini with Batch Prediction

--

Saving Time and Money using Gemini with Batch Prediction

What is Batch Prediction?

Batch prediction is a technique for saving time and money instead of processing individual requests to Gemini, it will send a large number of multimodal or not prompts in a single batch. Additionally, responses will be asynchronous and completed at their output location.

Besides, there is a great news… Batch requests for Gemini models are discounted 50% from standard requests.

Batch Prediction Uses Cases

Batch prediction will have three great benefits: saving time, and money, and minimizing computational resources.

Some use cases:

  • Translation to millions of sentences
  • Create a summary for a ton of documents
  • Automate a workflow using gemini and executing periodically
  • Imagine an online store with thousands of products. Instead of writing descriptions manually, retailers can use Gemini to generate unique and engaging descriptions for each item, saving time and resources.
  • Analyze customer data and purchase history to provide tailored product recommendations.
  • Process large volumes of customer reviews to extract key insights and sentiments.
  • Analyze video content and generate relevant tags for improved searchability and organization.
  • Automate the processing of insurance claims by extracting information from documents and images.
  • Analyze patient data and medical history to suggest personalized treatment options.

There are two ways to use batch prediction on VertexAI: Batch prediction for Cloud Storage and Batch prediction for Bigquery

Batch Prediction for Cloud Storage

Let’s go to the code

First one Prepare your inputs:

  • File format must be a JSON Lines (JSONL)
  • Create a service account that grants read and write permissions to the cloud storage and this credential should be used for the script where batch prediction will be used.
  • Choose a gemini model that supports batch prediction.

The following is an example of a simple prompt in a JSON line:

{
"request": {
"contents": [
{
"role": "user",
"parts": [
{
"text": "Translate to spanish the following sentence: I am saving time and money with Batch Prediction"
}
]
}
]
}
}

The following is an example of a multimodal prompt in a JSON line:

{
"request": {
"contents": [
{
"role": "user",
"parts": [
{
"text": "What animals can you detect in this video?"
},
{
"fileData": {
"fileUri": "gs://generative-ai/video/animals.mov",
"mimeType": "video/mov"
}
}
]
}
]
}
}

Second one Request a batch prediction job:

One example using VertexAI

Using batch prediction with Cloud Storage

Let’s explain this code a little:

Input Data Preparation:

  • Constructs the file name based on the interest_id for expected JSONL format.
  • Defines the input data URI (location) stored in Google Cloud Storage (GCS).

Batch Prediction Job Submission:

  • Creates a BatchPredictionJob object using the submit method.
  • Specifies the following arguments:
  • source_model: Name of the LLM model to be used (here, "gemini-1.5-flash-002").
  • input_dataset: URI of the input data set (previously defined input_uri).
  • output_uri_prefix: Prefix for the output location in GCS (without specific filename).

Job Monitoring Loop:

  • Loops until the job is completed
  • Waits for 10 seconds using time.sleep.
  • Prints the current job state retrieved using batch_prediction_job.state.name.
  • Refreshes the job object status with batch_prediction_job.refresh.

Job Result Handling:

  • Check if the job succeeded using batch_prediction_job.has_succeeded.
  • If failed, print “Job failed” along with the error message from batch_prediction_job.error.

This is an example of output:

Additionally, you can see the Job and its details in Vertex AI in the Batch Predictions option something like this:

Batch Predictions in VertexAI
Detail of batch predictions in VertexAI

Elapse Times using Bath Prediction

We have two examples with some elapsed time using batch prediction.

  • Approximately 1.5K requests were processed; The elapsed time was 2 minutes and 55 seconds.
elapsed time was 2 min 55 sec
  • Approximately 5.5K requests were processed; The elapsed time was 3 minutes and 29 seconds.
elapsed time was 3min 29 sec

I have used gemini online with similar quantities and the elapsed time is far from these examples, for this reason, I advise you that if you need to process a large number of prompts, use batch predictions.

Conclusions

Batch prediction is a technique for saving time and money instead of processing individual requests to Gemini. There are many use cases where you can use this technique. There are two ways to use this technique VertexAI with Cloud Storage or Bigquery. We have presented an example, the code is simple and the elapsed times are excellent.

--

--

Juan Guillermo Gómez Torres
Juan Guillermo Gómez Torres

Written by Juan Guillermo Gómez Torres

Tech Lead in Wordbox. @GDGCali Founder. @DevHackCali Founder. Firebase & GCP & Kotlin & AI @GoogleDevExpert

No responses yet