Bringing Gen AI to the data in BigQuery

Pooja Kelgaonkar
Google Cloud - Community
4 min readMar 14, 2024

GCP BigQuery is a serverless and cost-efficient data service of GCP. You can use GCP BQ to implement a data platform — data warehouse, or data lake on GCP. GCP BQ also offers ML implementation to integrate ML use cases in an SQLish way. BQ ML allows users to create and maintain models using BQ SQL. There are pre-trained models available to be used in BQ ML. The following models are built into BigQuery ML:

  • Linear Regression: This is used for forecasting.
  • Logistic Regression: This is used for the classification implementation.
  • Clustering: This is used for data segmentation.
  • Time series: This is used for performing time series forecasts. The model automatically handles anomalies, seasonality, and holidays.

You can create a model using the CREATE model statement. You might have used some of these pre-trained models to implement ML use cases with BQ.

With the latest releases and updates, you can integrate and implement Generative AI models available in GCP Vertex AI. You can create a connection to Vertex AI-trained models such as PaLM 2 or Gemini Pro 1.0, or import custom models based on TensorFlow, TensorFlow Lite, and XGBoost.

P.C: GCP Blog post

You can also share your trained models through BigQuery, ensuring that data is used in a governed manner and that datasets are easily discoverable.

Benefits of bringing Gen AI to the data:

  1. This helps to eliminate the need to build and manage data pipelines between BigQuery and generative AI model APIs
  2. Streamlines governance and helps reduce the risk of data loss by avoiding data movement
  3. I also help reduce the need to write and manage custom Python code to call AI models
  4. Enables you to analyze data at petabyte-scale without compromising on performance
  5. This helps to lower total cost of ownership with a simplified architecture

Integrating BigQuery ML and Gemini 1.0 Pro

You can create a model using Gemini from Vertex AI model garden. You can implement the model using SQL DDL and SELECT commands. Follow the below steps to create model and start using it :

  1. Step 1: Create an External connection

Go to BQ -> Explorer -> Click +Add

Select the third options — connections to external data sources

select the vertex ai remote model and functions option

Provide the connection name in the connection ID. This will be the connection, you need to use to integrate the Gemini model.

2. Step 2: Go to the external connections and open the connection created in step 1. Get the service account id. The ID will look like — bqcx-xxxxx-xxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com

3. Step 3: Add permissions to the service account listed in Step 2. go to IAM -> Grant access -> Mention the service account copied in step 2 as principal and select -> Vertex AI User role to be granted. Save the changes.

4. Step 4: Create MODEL using the connection created in step 1. The remote with connection is the string to be taken from the external connection created in step 1. Open the external connection and copy the connection ID to be replaced in REMOTE WITH CONNECTION in below create DDL

CREATE OR REPLACE MODEL
`project_id.genai_bq_poc.gen_ai_poc`
REMOTE WITH CONNECTION `projects/projectname/locations/us/connections/genai_bq_poc`
OPTIONS (ENDPOINT = 'gemini-pro');

Here, OPTIONS offers you to mention the endpoint hence the model you would like to use from Vertex AI model Garden. In the given example, Gemini model is used which is generally available. you can also use textbison and mention the same name as it appears in the model garden.

5. Step 5: Using the model created in step 4. Now, you can apply the model on any table from the public dataset or you can create your own prompt table. Here, I am creating one sample prompt table

/* CREATE PROMPTS TABLE */
CREATE TABLE DATASETID.genai_demo_prompts
(
PROMPTS STRING,
DATE STRING
);

/* INSERT SAMPLE PROMPTS */
INSERT INTO DATASETID.genai_demo_prompts VALUES ('WHAT IS GEMINI PRO IN GCP','2024-03-14);

6. Step 6: Run the SELECT statement to get answers to the prompts using the Gemini model connection setup in step 1.

SELECT ml_generate_text_llm_result,prompt,date
FROM
ML.GENERATE_TEXT(
MODEL `projectid.genai_bq_poc.gen_ai_poc`,
TABLE `projectid.genai_bq_poc.genai_demo_prompts`,
STRUCT(
0.4 AS temperature, 100 AS max_output_tokens, 0.5 AS top_p,
40 AS top_k, TRUE AS flatten_json_output));

You get sample results like this:

Your model is ready to serve you for any use case using table data. You can use it to summarize, generate sentiments, generate text, etc.

Hope this blog helps you to get started and create a model using Gemini from Vertex AI model garden in SQL way. Follow me to read more about data use cases and implementations using Snowflake and Google Cloud Platform.

About Me :

I am one of the Snowflake Data Superheroes 2023 & 2024. I am a DWBI and Cloud Architect! I am currently working as Senior Data Architect — GCP, Snowflake. I have been working with various Legacy data warehouses, data implementations, and Cloud platforms/Migrations. I am a SnowPro Core certified Data Architect as well as a Google-certified Google Professional Cloud Architect. You can reach out to me on LinkedIn if you need any further help on certification, Data Solutions, and Implementations!

--

--

Pooja Kelgaonkar
Google Cloud - Community

My words keep me going, Keep me motivating to reach out to more and more!