When to use Gemini or purpose-built AI models in BigQuery

Published in

Google Cloud - Community

7 min readMay 1, 2024

The integration of large language models (LLMs), such as Gemini 1.0 Pro, in BigQuery enables data practitioners to use SQL to unlock new analysis patterns (e.g., content generation, retrieval-augmented generation) and simplify existing ones (e.g., text translation, sentiment analysis). One example is BigQuery’s ML.GENERATE_TEXT function, which you can use to prompt LLM models over your data directly within your data warehouse, eliminating the need for data movement. Even before it could call Gemini models, BigQuery could call numerous purpose-built models for text translation, natural language processing, optical character recognition, entity detection, and others to help you scalably integrate these tools into your data analytics workflow with SQL.

Now that LLMs can perform many of the same tasks purpose-built AI models were designed to handle and are integrated into BigQuery, should you migrate all of your BigQuery workloads to use Gemini? Or instead, use only purpose-built AI models? The answer is likely some combination of the two, based on your workload’s requirements.

This is the first in a series of posts that will take a look at several data preparation and analysis tasks you can perform in BigQuery using its native integration with Gemini and purpose-built AI models. Throughout this series, you’ll see a comparison of these considerations and learn how to perform each task with both Gemini models and purpose-built AI models:

Translation
Sentiment analysis
Optical character recognition (OCR)
Document processing
Image annotation

In this post, I’ll provide some high-level guidance to consider when determining whether Gemini foundation models or purpose-built AI models are a better fit for your workload’s requirements.

A breakdown of which tools that are integrated with BigQuery support a given use case

A word of caution: These shouldn’t be thought of as hard and fast rules or an exhaustive list of all considerations. This is intended to give you some context to help you figure out which tool makes the most sense for your workload. As always, your experience may vary!

Factors to consider across all workloads

Regional considerations

Using models from Vertex AI in BigQuery first requires users to create a remote connection and create a remote model within the BigQuery region where your data is stored and queries are run. This is true for both Gemini models, such as Gemini 1.0 Pro and 1.0 Pro Vision, and purpose-built AI models. BigQuery requires your data storage, model endpoint, and query execution to all reside within the same Google Cloud region.

Thankfully, Gemini models and BigQuery both have broad regional support. As of this post’s publish date, BigQuery and Gemini models share more than 25 regions in which they are both available. Additionally, functions that call Gemini foundation models are also supported in BigQuery’s US and EU multi-regions. You can see the full list of supported regions for Gemini models to find the latest list of supported regions that fit your needs. Be sure to check this matches the region where your data is stored.

The purpose-built AI models, however, offer more limited regional flexibility to analyze data in BigQuery. For example, the Cloud Vision, Translation, and Natural Language APIs can only be used in BigQuery’s US and EU multi-regions¹. You can see the full list of support regions for all integrated purpose-built AI models in the BigQuery documentation.

Cost sensitivity

Cost sensitivity and impacts are highly dependent on your model usage. Use the Google Cloud cost calculator to estimate your cost workloads.

Gemini models typically have a lower total cost for workloads that have a large number of text character inputs, while purpose-built AI models tend to have a lower cost for image-heavy workloads. For example²: Using the Cloud Natural Language API’s sentiment analysis and Gemini 1.0 Pro to analyze the sentiment of roughly 3.8 million characters has the following costs:

Using the Cloud Vision API’s image annotation and Gemini 1.0 Pro Vision to analyze roughly 3,000 images has the following costs:

Input context

When choosing which model is best for your analysis, consider whether your data can be analyzed in isolation, or requires additional context to support the analysis. If your data can be analyzed without other context being considered by the model — like translating a string of text without knowing why it was written — then a purpose-built model might be a better choice. This could be because you can provide any necessary context in your analysis through other means (e.g., dashboards, pre-processing, other data columns) or additional context is not necessary for this specific use case. Additionally, keep in mind that purpose-built models have a predefined set of tasks they support. If your workload requires support outside these (e.g., image annotation that requires custom labels), purpose-built models likely are not a fit for your use case.

Translation input context

A comparison of supported input context for purpose-built AI models (left) and Gemini models (right)

Though it is not required, Gemini models are often best suited for use cases where your data requires additional context. By providing other data inputs, an explanation of the data, or context around certain aspects of your data, Gemini models can consider that context in performing the task. This is useful where purpose-built AI models might struggle, such as detecting sarcasm in written sentiment or translating specific idioms between languages when no exact translation is available. Keep in mind that providing additional context as input to Gemini increases token count and cost.

Given that purpose-built AI models typically have a lower price point for image-heavy use cases, it might make sense to start such workloads with purpose-built AI models over a representative sample of your data and determine if your use case would benefit from providing additional context around the data during analysis. If so, you can then run the same analysis using Gemini models, compare the outputs, and determine if any resulting improvements are worth the added costs.

Output structure

BigQuery returns a consistent output structure for queries that call Gemini, which is extremely helpful if you want to work with the response’s safety attributes or its response status. However, it cannot guarantee that the text generated by Gemini (the ml_generate_text_function column in your query results) will have a precise, consistent output structure for each input. This is because Gemini models are, by definition, non-deterministic models. You can mitigate much of this variation with skilled prompt engineering (e.g., include few-shot examples) and model parameter tuning (e.g., set your temperature parameter to 0), which can turn this into an advantage: You can use your prompt to define your output structure from Gemini. For instance, you can instruct the model to return your result as JSON with custom parent/child fields and provide an example of how to produce this result. This could mean less post-processing to handle your results, though it may result in unexpected errors given the possibility for variation in your response structure. Additionally, keep in mind that LLMs like Gemini may hallucinate results, which presents different risks for various tasks. We’ll dig into this more in the task-specific breakdowns.

Translation output structure

A comparison of output structures from purpose-built AI models (left) and Gemini models (right)

Purpose-built AI models produce the same query response structure each time a query successfully calls these models, which can be easily parsed using BigQuery’s JSON functions. While an advantage in many cases, this can be a drawback if the predefined response structure does not suit your needs. The query output structure for each model is defined in the documentation for their corresponding SQL function³. Additionally, many pretrained models include additional metadata about their results. For example, the Translate model output includes information about the input language and the Natural Language model output includes information about the magnitude of the sentiment. Generating this metadata is possible with Gemini, but it will require significant prompt engineering to generate such additional information and likely won’t be available at the same granularity as the purpose-built AI models.

Quota limitations

All of BigQuery’s features and functions have some kind of quota limits in place. Though most practitioners are rarely impacted by them, these quotas should still inform how you build your data analytics workloads with BigQuery. This is also true for functions that call Vertex AI and Cloud AI services, including functions that call Gemini models and functions like ML.TRANSLATE and ML.UNDERSTAND_TEXT that call purpose-built AI models. BigQuery’s documentation spells out the limits for each function. Generally, functions that call purpose-built AI models support larger workloads for both requests per minute and rows per job, though you should check the documentation for each tool to confirm.

Summary

Overall, calling Gemini models in BigQuery offers more flexibility by performing multiple tasks, can consider additional context, and provide a custom output structure. But purpose-built AI models typically have additional metadata and a consistent output structure that can be advantageous for downstream applications. And purpose-built AI models tend to be cheaper for image-intensive workloads, while Gemini models tend to be more cost-effective for text-intensive workloads. When choosing a model, you should consider the specific requirements of your task and the resources that you have available.

A high-level summary of each factor to consider

Be sure to check back for upcoming posts too! I’ll dive deeper into a comparison of each of these factors for Gemini and purpose-built AI models, plus show how to use these tools to build analytics workloads that can be used for text translation, sentiment analysis, optical character recognition (OCR), document processing, and image annotation.

Footnotes:

The APIs for the purpose-built AI models offer broad regional availability, and can be called to analyze data in BigQuery region through a remote function. I’ll provide an example of this in a future blog post
These costs do not include BigQuery analysis costs, though they would be the same for each workload. This is meant as a comparative example of the cost differences between the two, not a complete cost estimate for a given workload. Prices include the free tier for each product.
You can see all of these functions in the BigQuery ML SQL reference docs under “AI Functions”