Gemini in BigQuery: Unlocking Multimodal Search with Vector Embeddings

Charu Shelar
Google Cloud - Community
6 min readDec 4, 2024

In a previous blog we explored how Gemini in BigQuery acts as a powerful AI assistant, providing data insights, and access to Vertex AI models. This blog post will guide you through using vector search in BigQuery to perform semantic searches using both text and image(or other multimodal) inputs.

Beyond Keywords: The Power of Embeddings for Search

Traditional search relies on keyword matching, which often fails to capture the nuances of human language and the complexities of multimodal data. Imagine searching for “happy tom and jerry pictures” — a keyword-based search might miss images where the tom and jerry’s happiness is conveyed through posture or expression rather than explicit labels.

Embeddings revolutionise search by representing data semantically. Embedding models transform unstructured data like text and images into numerical vectors called embeddings. These vectors capture the underlying meaning of the data, enabling searches based on semantic relationships, not just surface-level keywords.

How Embeddings Power Multimodal Search

Think of embeddings as points in a multi-dimensional space, where the distance between points reflects semantic similarity. Words with similar meanings cluster together, and images with similar visual content occupy nearby regions.

This spatial representation unlocks the power of cross-modal search, allowing us to:

  • Search for images using text: Input a text query like “sunset over the ocean,” and multimodal search will retrieve images that match the query’s semantic meaning, even if they lack specific tags.
  • Search for images using other images: Input an image of a cat, and multimodal search will find visually similar images, regardless of breed or pose.

The Multimodal Search Process

  1. Conversion: The input query (text or image) is converted into an embedding using a machine learning model.
  2. Comparison: This embedding is compared to other embeddings in a vector database, calculating distances to find the closest matches.
  3. Ranking: Results are returned, ranked by their semantic similarity to the input query.

Vector Search in BigQuery: Efficient and Scalable Exploration

BigQuery’s vector search capabilities make multimodal exploration efficient and scalable, handling millions or even billions of records with ease.

  1. Generate embeddings within BigQuery: Create embeddings directly within your BigQuery environment using the ML.GENERATE_EMBEDDING function. This eliminates the need for external tools and complex workflows.
ML.GENERATE_EMBEDDING(
MODEL project_id.dataset.model_name,
{ TABLE table_name | (query_statement) },
STRUCT(
[flatten_json_output AS flatten_json_output]
[, task_type AS task_type]
[, output_dimensionality AS output_dimensionality]
)
)
SELECT *
FROM ML.GENERATE_EMBEDDING(
MODEL `PROJECT_ID.DATASET_ID.MODEL_NAME`,
TABLE `PROJECT_ID.DATASET_ID.TABLE_NAME`,
STRUCT(
FLATTEN_JSON AS flatten_json_output,
OUTPUT_DIMENSIONALITY AS output_dimensionality
)
);
SELECT *
FROM ML.GENERATE_EMBEDDING(
MODEL `PROJECT_ID.DATASET_ID.MODEL_NAME`,
TABLE PROJECT_ID.DATASET_ID.TABLE_NAME,
STRUCT(
FLATTEN_JSON AS flatten_json_output,
START_SECOND AS start_second,
END_SECOND AS end_second,
INTERVAL_SECONDS AS interval_seconds
)
);

2. Create and manage vector indexes: BigQuery allows you to create vector indexes on top of your embedding tables, optimising search performance and ensuring efficient retrieval. These indexes are automatically maintained and updated by BigQuery, simplifying your workflow.

CREATE [ OR REPLACE ] VECTOR INDEX [ IF NOT EXISTS ] index_name
ON table_name(column_name)
[STORING(stored_column_name [, ...])]
OPTIONS(index_option_list);

3. Run vector searches using SQL: Perform multimodal searches at BigQuery scale using familiar SQL syntax, making the process intuitive and accessible.

VECTOR_SEARCH(
{TABLE base_table | base_table_query_statement},
column_to_search,
{TABLE query_table | query_statement}
[, query_column_to_search => query_column_to_search_value]
[, top_k => top_k_value ]
[, distance_type => distance_type_value ]
[, options => options_value ]
)

Multimodal Search in Action: Examples and Applications

Let’s explore the potential of multimodal search with real-world examples:

Image-to-Image Search: Finding Visually Similar Content (Example 1)

Imagine an e-commerce platform where users can search for products using images. A customer takes a picture of a stylish pair of shoes and uploads it to the platform. Using Gemini’s multimodal embeddings and BigQuery’s vector search, the platform instantly retrieves visually similar shoes from its catalog, even if the customer’s image doesn’t perfectly match any existing product photos.

Text-to-Image Search: Bridging Words and Visuals (Example 2)

A journalist is writing an article about “sustainable urban farming” and needs a compelling image to accompany the text. Instead of manually browsing through stock photo libraries, they use a text-to-image search powered by Gemini. By inputting the article’s key phrases, the journalist quickly finds a visually striking image depicting a rooftop garden in a bustling city.

Image-to-Text Search: Describing Images with Precision

A museum curator is digitising their collection of historical photographs. They use Gemini to generate detailed captions for each image, enriching the museum’s online catalog and making it more accessible to researchers and visitors. The captions capture not only the objects depicted but also the historical context and artistic style, offering a deeper understanding of each photograph.

  1. Create the object table
-- Create the object table
CREATE OR REPLACE EXTERNAL TABLE `bqml_mm_search.product_images`
WITH CONNECTION `us.bqml_vertex_ai_connection`
OPTIONS
( object_metadata = 'SIMPLE',
uris = ['gs://mm_product_search/product_images/*']
);

Display Product images

inspect_obj_table_query = """
SELECT uri, content_type
FROM bqml_mm_search.product_images
WHERE content_type = 'image/jpeg'
Order by uri
LIMIT 5;
"""
printImages(client.query(inspect_obj_table_query))

2. Create the remote model

CREATE OR REPLACE MODEL `bqml_mm_search.multimodal_embedding_model`
REMOTE WITH CONNECTION `us.bqml_vertex_ai_connection`
OPTIONS (ENDPOINT = 'multimodalembedding@001');

3. Generate image embeddings

CREATE OR REPLACE TABLE `bqml_mm_search.product_image_embeddings`
AS
SELECT *
FROM
ML.GENERATE_EMBEDDING(
MODEL `bqml_mm_search.multimodal_embedding_model`,
(SELECT * FROM `bqml_mm_search.product_images` WHERE content_type = 'image/jpeg' LIMIT 10000))

See if there were any embedding generation failures

SELECT DISTINCT(ml_generate_embedding_status),
COUNT(uri) AS num_rows
FROM bqml_mm_search.product_image_embeddings
GROUP BY 1;

4. Create a vector index

CREATE OR REPLACE
VECTOR INDEX `met_images_index`
ON
bqml_mm_search.product_image_embeddings(ml_generate_embedding_result)
OPTIONS (
index_type = 'IVF',
distance_type = 'COSINE');

Check if the vector index has been created

SELECT table_name, index_name, index_status,
coverage_percentage, last_refresh_time, disable_reason
FROM bqml_mm_search.INFORMATION_SCHEMA.VECTOR_INDEXES
WHERE index_name = 'met_images_index';

Code Example 1: Image-to-Image Search

1. Create the object table for the search images(s)

CREATE OR REPLACE EXTERNAL TABLE `bqml_mm_search.search_product_images`
WITH CONNECTION `us.bqml_vertex_ai_connection`
OPTIONS
( object_metadata = 'SIMPLE',
uris = ['gs://mm_product_search/test_images/*']
);

2. Generate image embeddings for the search images(s)

CREATE OR REPLACE TABLE `bqml_mm_search.search_product_image_embeddings`
AS
SELECT *
FROM
ML.GENERATE_EMBEDDING(
MODEL `bqml_mm_search.multimodal_embedding_model`,
(SELECT * FROM `bqml_mm_search.search_product_images` WHERE content_type = 'image/jpeg' LIMIT 1000))

See if there were any embedding generation failures

SELECT DISTINCT(ml_generate_embedding_status),
COUNT(uri) AS num_rows
FROM bqml_mm_search.search_product_image_embeddings
GROUP BY 1;

3. Perform a vector search (image-to-image search)

CREATE OR REPLACE TABLE `bqml_mm_search.vector_image_search_results` AS
SELECT query.uri, base.uri AS gcs_uri, distance
FROM
VECTOR_SEARCH(
TABLE `bqml_mm_search.product_image_embeddings`,
'ml_generate_embedding_result',
-- TABLE `bqml_mm_search.search_product_image_embeddings`,
(SELECT * FROM `bqml_mm_search.search_product_image_embeddings` where uri like '%gs://mm_product_search/test_images/test 1.jpg%'),
'ml_generate_embedding_result',
top_k => 5
);
Input image (test set: source dataset)

4. Visualise the vector search results

Results (product set: source dataset)

Code Example 2: Text-to-Image Search

1. Generate an embedding for the search text

CREATE OR REPLACE TABLE `bqml_mm_search.search_embedding`
AS
SELECT * FROM ML.GENERATE_EMBEDDING(
MODEL `bqml_mm_search.multimodal_embedding_model`,
(
SELECT 'black t-shirt' AS content
)
);

2. Perform a cross-modality text-to-image search

CREATE OR REPLACE TABLE `bqml_mm_search.vector_search_results` AS
SELECT base.uri AS gcs_uri, distance
FROM
VECTOR_SEARCH(
TABLE `bqml_mm_search.product_image_embeddings`,
'ml_generate_embedding_result',
TABLE `bqml_mm_search.search_embedding`,
'ml_generate_embedding_result',
top_k => 5);

Visualise the vector search results

query = """
SELECT * FROM `bqml_mm_search.vector_search_results`
ORDER BY distance;
"""

printImages(client.query(query))
Results (product set: source dataset)

This GitHub repository contains the complete SQL script and notebook.

The dataset used in the demonstration can be found on Kaggle.

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Charu Shelar
Charu Shelar

No responses yet