Making sense of Vector Search and Embeddings across GCP products

Steve Loh
Google Cloud - Community
8 min readApr 2, 2024

Intro

Many of you have already used the Large Language Model (LLM) from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:

  • provide accurate and up-to-date information (reducing hallucination)
  • offer contextual user experiences
  • offer secure and governed access to the data

Hence comes the Retrieval-Augmented Generation technique (RAG) to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:

  • Retrieval: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.
  • Generation: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.

So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.

  • Vector embeddings are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex textembedding-gecko model generates a 768-dimensional embedding, while a multimodal embedding model would generate a 128, 256, 512, or 1408 dimensional embedding.
  • Vector search comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.

Now with these concepts explained, you can implement RAG with the following steps:

  • Break down large documents or text corpus using a suitable chunking strategy
  • Generate embeddings for each chunk using a selected embedding model
  • Store the chunked data and vector embeddings together in a vector database
  • User posts a prompt query
  • Use the same pre-trained embedding model to generate a vector embedding for the user query
  • Use the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunk
  • Create a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user query

Vector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:

  • AlloyDB (GA)
  • Cloud SQL for PostgreSQL (GA)
  • Cloud SQL for MySQL (Preview)
  • Spanner (Preview)
  • Memorystore for Redis (Preview)
  • Firestore (Preview)
  • Bigtable (Preview)

Here I will showcase vector implementation across 3 main data product families on GCP:

  • AlloyDB — Transactional database
  • BigQuery — Enterprise data warehouse
  • Vertex AI Vector Search — Machine learning platform

Disclaimer

I work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.

Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.

AlloyDB

AlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.

Setup

psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"

psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"

Embeddings generation

  • Create a new column with the type vector to store the embeddings:
    - The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.
    - AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.
ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);
  • To generate embedding, use the embedding() function:
    - To use textembedding-gecko model, the AlloyDB cluster must reside in region us-central1 to match the region of the model.
    - You can invoke predictions to get around the region restriction.
    - 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.
SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');
  • To generate embedding value based on another column:
UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);
  • Alternatively, you can also create an embedding column with default value generated from another column:
ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;

Vector index

  • By default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.
    - SQ8 supports vectors with up to 8000 dimensions.
    - You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).
CREATE INDEX embedding_column_idx ON my_products
USING ivf (embedding_column vector_l2_ops)
WITH (lists = 20, quantizer = 'SQ8');

Vector search

  • Perform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:
SELECT product_name FROM my_products
ORDER BY embedding_column
<-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector
LIMIT 10;

Check the following links for more information:

BigQuery

Setup

  • BigQuery is a serverless service and no resource setup is needed for it.
  • Create a remote connection to Vertex AI remote models:
bq mk --connection --location=US --project_id={PROJECT_ID}  --connection_type=CLOUD_RESOURCE vertex_embeddings
  • Grant ‘Vertex AI User’ role to the service account of the created connection:
gcloud projects add-iam-policy-binding {PROJECT_ID} \
--member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \
--role='roles/aiplatform.user'

Embeddings generation

  • Create a remote embedding model to represented the hosted textembedding-gecko model:
CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model
REMOTE WITH CONNECTION `us.vertex_embeddings`
OPTIONS(ENDPOINT='textembedding-gecko@003');
  • You can now generate text embeddings using the ML.GENERATE_EMBEDDING function:
    - We use data from a public dataset table called imdb.reviews in this example.
    - The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.
CREATE OR REPLACE TABLE test_embeddings.embedded_reviews
AS SELECT content as review, text_embedding
FROM
ML.GENERATE_TEXT_EMBEDDING(
MODEL `test_embeddings.llm_embedding_model`,
(SELECT review as content
FROM bigquery-public-data.imdb.reviews limit 8000
),
STRUCT(TRUE AS flatten_json_output)
);

Vector index

  • Create vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.
    - Currently supported distance types are EUCLIDEAN (L2) and COSINE.
    - Currently only IVF is supported for index type.
    - The created index is fully managed by BigQuery, the refresh happens automatically as data changes.
    - The metadata information of the vector index is available via INFORMATION_SCHEMA.VECTOR_INDEXES view.
CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');

Vector search

  • Use VECTOR_SEARCH function to perform text similarity search:
    - It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.
SELECT
*
FROM
VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', (
SELECT
ml_generate_embedding_result,
content AS query
FROM
ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`,
(
SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content
))
),
top_k => 5);

Check the following links for more information:

Vertex AI Vector Search

Vertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.

Following code snippets are based on Python.

Setup

  • Import aiplatform package:
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
  • Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.

Vector index

  • Create a vector index endpoint, which is a server instance that accepts query requests for your index.
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
display_name = f"index-endpoint-{PROJECT_ID}",
public_endpoint_enabled = True
)
  • Create a vector search index:
    - EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required input data format and structure
    - Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.
    - See here for available distance measure type.

my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
display_name={DISPLAY_NAME},
contents_delta_uri={EMBEDDING_BUCKET_URI},
dimensions=768,
approximate_neighbors_count=10,
distance_measure_type="DOT_PRODUCT_DISTANCE",
)
  • Deploy the index to the index endpoint:
my_index_endpoint = my_index_endpoint.deploy_index(
index=my_index, deployed_index_id = DEPLOYED_INDEX_ID
)

Vector search

  • Now you can search the vector index using a query embedding:
# get the query embedding
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"
query_embeddings = model.get_embeddings([query])[0]

# query the index endpoint to find 3 nearest neighbors.
response = my_index_endpoint.find_neighbors(
deployed_index_id=my_index_endpoint.deployed_indexes[0].id,
queries=[query_embeddings.values],
num_neighbors=3,
)

I have created a notebook to demonstrate how to do vector search in Vertex AI.

Summary

2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.

--

--

Steve Loh
Google Cloud - Community

I lead the data analytics customer engineering team of Google Cloud in Benelux region. I enjoy learning and helping customers through my 22 years of experience.