Saving Time and Money using Gemini with Batch Prediction
What is Batch Prediction?
Batch prediction is a technique for saving time and money instead of processing individual requests to Gemini, it will send a large number of multimodal or not prompts in a single batch. Additionally, responses will be asynchronous and completed at their output location.
Besides, there is a great news… Batch requests for Gemini models are discounted 50% from standard requests.
Batch Prediction Uses Cases
Batch prediction will have three great benefits: saving time, and money, and minimizing computational resources.
Some use cases:
- Translation to millions of sentences
- Create a summary for a ton of documents
- Automate a workflow using gemini and executing periodically
- Imagine an online store with thousands of products. Instead of writing descriptions manually, retailers can use Gemini to generate unique and engaging descriptions for each item, saving time and resources.
- Analyze customer data and purchase history to provide tailored product recommendations.
- Process large volumes of customer reviews to extract key insights and sentiments.
- Analyze video content and generate relevant tags for improved searchability and organization.
- Automate the processing of insurance claims by extracting information from documents and images.
- Analyze patient data and medical history to suggest personalized treatment options.
There are two ways to use batch prediction on VertexAI: Batch prediction for Cloud Storage and Batch prediction for Bigquery
Batch Prediction for Cloud Storage
Let’s go to the code
First one Prepare your inputs:
- File format must be a JSON Lines (JSONL)
- Create a service account that grants read and write permissions to the cloud storage and this credential should be used for the script where batch prediction will be used.
- Choose a gemini model that supports batch prediction.
The following is an example of a simple prompt in a JSON line:
{
"request": {
"contents": [
{
"role": "user",
"parts": [
{
"text": "Translate to spanish the following sentence: I am saving time and money with Batch Prediction"
}
]
}
]
}
}
The following is an example of a multimodal prompt in a JSON line:
{
"request": {
"contents": [
{
"role": "user",
"parts": [
{
"text": "What animals can you detect in this video?"
},
{
"fileData": {
"fileUri": "gs://generative-ai/video/animals.mov",
"mimeType": "video/mov"
}
}
]
}
]
}
}
Second one Request a batch prediction job:
One example using VertexAI
Let’s explain this code a little:
Input Data Preparation:
- Constructs the file name based on the
interest_id
for expected JSONL format. - Defines the input data URI (location) stored in Google Cloud Storage (GCS).
Batch Prediction Job Submission:
- Creates a
BatchPredictionJob
object using thesubmit
method. - Specifies the following arguments:
source_model
: Name of the LLM model to be used (here, "gemini-1.5-flash-002").input_dataset
: URI of the input data set (previously definedinput_uri
).output_uri_prefix
: Prefix for the output location in GCS (without specific filename).
Job Monitoring Loop:
- Loops until the job is completed
- Waits for 10 seconds using
time.sleep
. - Prints the current job state retrieved using
batch_prediction_job.state.name
. - Refreshes the job object status with
batch_prediction_job.refresh
.
Job Result Handling:
- Check if the job succeeded using
batch_prediction_job.has_succeeded
. - If failed, print “Job failed” along with the error message from
batch_prediction_job.error
.
This is an example of output:
Additionally, you can see the Job and its details in Vertex AI in the Batch Predictions option something like this:
Elapse Times using Bath Prediction
We have two examples with some elapsed time using batch prediction.
- Approximately 1.5K requests were processed; The elapsed time was 2 minutes and 55 seconds.
- Approximately 5.5K requests were processed; The elapsed time was 3 minutes and 29 seconds.
I have used gemini online with similar quantities and the elapsed time is far from these examples, for this reason, I advise you that if you need to process a large number of prompts, use batch predictions.
Conclusions
Batch prediction is a technique for saving time and money instead of processing individual requests to Gemini. There are many use cases where you can use this technique. There are two ways to use this technique VertexAI with Cloud Storage or Bigquery. We have presented an example, the code is simple and the elapsed times are excellent.
I hope this information is useful to you and remember to share this blog post, your comment is always welcome.
Visit my social networks: