BigQuery-Powered Natural Language Image Search: Multimodal Embeddings Within SQL Environment

Published in

Data on Cloud: GenAI, Data Science, and Data Engineering Insights

9 min readSep 6, 2024

In the evolving landscape of data analytics and AI, BigQuery is expanding its capabilities beyond traditional data warehousing. Now functioning as a vector database, BigQuery supports media embedding directly within its SQL environment. This advancement enables powerful multimedia search capabilities, particularly in processing and retrieving visual content through natural language queries.

1. Introduction: BigQuery as a Vector Database for Media Search

The advent of multimodal embeddings and advanced natural language processing (NLP) techniques is changing this landscape. These technologies enable users to search for images or videos — and even specific information within them — as intuitively as they would with text. At the core of this innovation is the concept of multimodal embedding, which transforms diverse data types into a unified, searchable format.

BigQuery, renowned for its data analytics capabilities, is now emerging as a powerful tool for media search. By functioning as a vector database, BigQuery offers a novel approach to storing, embedding, and retrieving multimedia content. This application of BigQuery opens up new possibilities for cross-modal semantic search, such as finding images based on text descriptions or extracting textual information from visual content.

In this blog post, we’ll explore how to implement a fashion image search system using BigQuery. We’ll demonstrate the process of embedding images into high-dimensional vectors and performing efficient similarity searches, all within the BigQuery environment. This approach not only streamlines media search architecture but also leverages BigQuery’s scalability and integration with other Google Cloud Platform services, providing a robust solution for handling diverse multimedia datasets.

Implementation Steps and Technical Details

2.1 Data Preparation: From Images to BigQuery

Our first step is to prepare our image data for use in BigQuery. For this demonstration, we’re using the Fashion Product Images (Small) dataset from Kaggle.

Fashion Product Images (Small) from Kaggle

Dataset Overview: The Fashion Product Images (Small) dataset is a valuable resource for e-commerce and computer vision research. It contains:

Professionally shot high-resolution product images
Multiple label attributes describing each product
Descriptive text commenting on product characteristics

This dataset represents the rich, multi-faceted nature of e-commerce data, making it an excellent choice for demonstrating multimodal search capabilities.

Data Upload: We uploaded 7,000 images from this dataset to a Google Cloud Storage bucket. This step prepares our visual data for processing in BigQuery.

Setting up the External Connection:

Before we can create our BigQuery object table, we need to establish an external connection to Vertex AI. This connection allows BigQuery to interact with the Vertex AI services, including the multimodal embedding model. Here’s how to set it up:

In the BigQuery Explorer pane, click the “+ Add” button and select “Connections to External data sources”.
If prompted, enable the BigQuery API.
In the “Connection type” dropdown, select “Vertex AI remote models, remote functions and BigLake (Cloud Resource)”.
Set the Connection ID (e.g., vertex-ai-connection-id) and ensure the location is set to Multi-region US.
Click “Create Connection”.

Important: Granting Necessary Permissions

After creating the connection, it’s crucial to grant the correct permissions to the service account associated with this connection. The service account will have a name starting with bqcx-.

For this setup to work correctly, you need to grant this service account the following roles:

BigQuery Data Editor
BigQuery Job User
Vertex AI User

To grant these roles, navigate to the IAM & Admin section in your Google Cloud Console, find the bqcx- service account, and add these roles.

Note: Proper permission setup is critical for the successful operation of your multimodal search system. Make sure to carefully assign these roles to avoid any access issues in later steps.

Creating a BigQuery Object Table: Next, we create a BigQuery object table that references our uploaded images. This step is crucial as it allows BigQuery to access and process the image data stored in Cloud Storage.

To facilitate this, we set up a BigLake table. BigLake extends BigQuery’s capabilities to data lakes, enabling us to query data stored in Cloud Storage as if it were native to BigQuery. Here’s how we created the BigLake table:

CREATE OR REPLACE EXTERNAL TABLE `your-project.your-dataset.fashion_images`
WITH CONNECTION `your-project.us.your-connection`
OPTIONS
 ( object_metadata = 'SIMPLE',
   uris = ['gs://your-bucket-name/fashion-product-images/*']
 );

This SQL command creates a BigLake table in BigQuery that points to our images in Cloud Storage. The WITH CONNECTION clause uses the connection we just created, establishing the link between BigQuery and Cloud Storage.

2.2 Setting Up Multimodal Embedding in BigQuery

Now that we have our data prepared and our connection established, we can set up the multimodal embedding model in BigQuery.

Creating the Multimodal Embedding Model

To create the embedding model, we use the following SQL command:

CREATE OR REPLACE MODEL `your-project.your-dataset.multimodal_embedding_model`
 REMOTE WITH CONNECTION `your-project.us.vertex-ai-connection-id`
 OPTIONS (ENDPOINT = 'multimodalembedding@001');

This command does the following:

Creates a new model (or replaces an existing one) in your specified dataset.
Uses the REMOTE WITH CONNECTION clause to link to the Vertex AI service we set up earlier.
Specifies the multimodalembedding@001 endpoint, which is Google's pre-trained multimodal embedding model.

The multimodalembedding@001 model generates 1408-dimension vectors based on the input provided, which can include a combination of image and text data. These embedding vectors can be used interchangeably for various use cases like searching images by text or vice versa.

2.3 Generating Image Embeddings

With our model set up, we can now generate embeddings for our fashion images.

Using BigQuery ML to Generate Embeddings

We’ll use the following SQL query to generate embeddings for all images in our fashion_images table:

CREATE OR REPLACE TABLE `your-project.your-dataset.fashion_images_embeddings`
AS
SELECT *
FROM ML.GENERATE_EMBEDDING(
  MODEL `your-project.your-dataset.multimodal_embedding_model`,
  (SELECT * FROM `your-project.your-dataset.fashion_images`)
);

This query:

Creates a new table fashion_images_embeddings to store our embeddings.
Uses the ML.GENERATE_EMBEDDING function with our multimodal embedding model.
Applies the model to all rows in our fashion_images table.

Embedding Generation Performance

We embedded over 7,000 images, and the process yielded the following performance metrics:

Total processing time: 22 minutes 20 seconds
Data processed: 936.37 KB
Slot time consumed: 29 minutes 56 seconds

Execution Details of Embedding 7000 pictures

This query ran 686% longer than the average prior execution, primarily due to the larger amount of data processed.

2.4 Implementing the Search Functionality

With our image embeddings generated and stored in BigQuery, we can now implement the search functionality. This involves two main steps: generating a query embedding from a text description, and then using this embedding to find similar images.

Generating Query Embeddings

First, we need to create an embedding for our text query. This is done using the same multimodal embedding model we used for our images. Here’s how we can do this:

CREATE OR REPLACE TABLE `your-project.your-dataset.query_embedding`
AS
SELECT * FROM ML.GENERATE_EMBEDDING(
 MODEL `your-project.your-dataset.multimodal_embedding_model`,
 (
   SELECT 'a red dress with floral pattern' AS content
 )
);

This query creates a new table query_embedding containing the embedding vector for our text description "a red dress with floral pattern". You can replace this text with any query you want to search for.

Performing Similarity Search

Now that we have our query embedding, we can perform a similarity search against our image embeddings. We’ll use the VECTOR_SEARCH function in BigQuery to do this:

WITH query_embedding AS (
  SELECT ml_generate_embedding_result AS embedding
  FROM `your-project.your-dataset.query_embedding`
)

SELECT
  base.uri,
  (1 - distance) AS similarity_score
FROM
  VECTOR_SEARCH(
    TABLE `your-project.your-dataset.fashion_images_embeddings`,
    'ml_generate_embedding_result',
    (SELECT embedding FROM query_embedding),
    top_k => 5,
    distance_type => 'COSINE'
  );

This query performs several key operations:

It selects the embedding for our text query.
Uses VECTOR_SEARCH to find similar images in our fashion_images_embeddings table.
Searches based on the ml_generate_embedding_result column (our image embeddings).
Returns the top 5 most similar results using cosine similarity.
Provides the image URI and a similarity score for each result.

The similarity score is calculated as (1 - distance), converting the cosine distance to a similarity measure where higher scores indicate greater similarity.

This BigQuery implementation leverages efficient vector search capabilities, allowing for quick and scalable similarity searches across large image datasets. You can easily adjust the top_k value to modify the number of results returned.

3. Demonstration: Image Search in Action

To showcase our search functionality in a more intuitive way, we’ve created an interactive demo in a Jupyter Notebook environment. This demo allows users to input queries and see the search results visually.

3.1 User Input Interface

First, we set up a simple user interface for query input:

import ipywidgets as widgets
from IPython.display import display
from google.cloud import bigquery
from google.cloud import storage
from google.oauth2 import service_account

# Set up credentials and clients
credentials = service_account.Credentials.from_service_account_file('yourcredential.json')
bq_client = bigquery.Client(credentials=credentials, project=credentials.project_id)
storage_client = storage.Client(credentials=credentials)

# Create input widgets
query_input = widgets.Text(
    value='a red dress with floral pattern',
    description='Query:',
    style={'description_width': 'initial'}
)
search_button = widgets.Button(description="Search")
output = widgets.Output()

# Define button click event
def on_button_clicked(b):
    with output:
        print("Searching...")
        global user_query
        user_query = query_input.value
        print(f"Query recorded: {user_query}")

search_button.on_click(on_button_clicked)

# Display interface
display(widgets.VBox([query_input, search_button, output]))

This code creates a text input field for the query, a search button, and an output area to display the search status.

3.2 Search Results Visualization

Next, we implement the search functionality and visualize the results:

import io
import matplotlib.pyplot as plt

def display_image_from_gs(uri):
    bucket_name = uri.split('/')[2]
    blob_name = '/'.join(uri.split('/')[3:])
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(blob_name)
    image_bytes = blob.download_as_bytes()
    image = plt.imread(io.BytesIO(image_bytes), format='jpg')
    return image

# Generate query embedding
embedding_query = f"""
CREATE OR REPLACE TABLE `multi_embedding.query_embedding`
AS
SELECT * FROM ML.GENERATE_EMBEDDING(
 MODEL `multi_embedding.multiembedding`,
 (
   SELECT '{user_query}' AS content
 )
);
"""
bq_client.query(embedding_query).result()

# Perform similarity search
search_query = """
WITH query_embedding AS (
  SELECT ml_generate_embedding_result AS embedding
  FROM `multi_embedding.query_embedding`
)

SELECT
  base.uri,
  base.ml_generate_embedding_status,
  (1 - distance) AS similarity_score
FROM
  VECTOR_SEARCH(
    TABLE `multi_embedding.fashion_images_embeddings`,
    'ml_generate_embedding_result',
    (SELECT embedding FROM query_example/output/_reference_image),
    top_k => 5,
    distance_type => 'COSINE'
  )
"""
results = list(bq_client.query(search_query).result())

# Display results
fig, axes = plt.subplots(1, 5, figsize=(20, 4))
fig.suptitle(f'Top 5 Similar Fashion Images for: "{user_query}"', fontsize=16)

for i, row in enumerate(results):
    uri = row['uri']
    similarity_score = row['similarity_score']
    
    image = display_image_from_gs(uri)
    axes[i].imshow(image)
    axes[i].axis('off')
    axes[i].set_title(f"Score: {similarity_score:.4f}")

plt.tight_layout()
plt.show()

# Print detailed results
for row in results:
    print(f"URI: {row['uri']}")
    print(f"Embedding Status: {row['ml_generate_embedding_status']}")
    print(f"Similarity Score: {row['similarity_score']:.4f}")
    print()

This code performs the similarity search based on the user’s query, retrieves the top 5 matching images, and displays them along with their similarity scores.

This interactive demo brings our multimodal search system to life, allowing users to experience firsthand how text queries can be used to find relevant fashion images in our database.

4. Conclusion

In this project, we successfully implemented a natural language-based image search functionality by leveraging multimodal embeddings directly within BigQuery. This approach demonstrates the power and flexibility of BigQuery as more than just a data warehouse, but as a comprehensive platform for advanced analytics and machine learning tasks.

Key achievements:

Utilized BigQuery’s multimodal embedding capabilities to process and index a large dataset of fashion images.
Implemented vector search functionality, enabling efficient similarity searches based on text queries.
Created an interactive demo that showcases the seamless integration of text-based queries with image retrieval.

By executing multimodal embeddings in BigQuery, we’ve eliminated the need for separate vector databases or external services, streamlining the entire process from data storage to search execution. This not only simplifies the architecture but also leverages BigQuery’s scalability and performance optimizations.

This implementation opens up new possibilities for businesses looking to enhance their search capabilities, particularly in e-commerce and content management systems. It demonstrates how advanced AI techniques can be integrated directly into existing data infrastructure, providing powerful, user-friendly search experiences.

As we continue to explore the capabilities of BigQuery and multimodal embeddings, we can envision further applications and refinements to this system, potentially extending to video search, multi-language support, or even more complex query understanding.

Referrence

Multimodel search using NLP, BigQuery and embeddings | Google Cloud Blog

Learn how to build a multimodal search solution for images and videos using NLP, BigQuery, and embeddings to enhance…

cloud.google.com