Implementing a Recommendation Feature Using Firestore’s Vector Search
During the Opening Keynote at Google Cloud ’24, numerous use cases and examples from the retail industry were showcased. These included an online clothing store demo that presents related products based on natural language input and video URLs, as well as a demo that supports product purchases through chat. These demonstrations strongly conveyed the message that vector search can now be implemented more easily.
Therefore, in this article, I will explore the implementation of a product database recommendation feature that suggests related products based on natural language input, utilizing the newly preview-released vector search functionality in Firestore.
What is Vector Search?
Vector search is a technique that represents data as vectors (arrays of numbers) and performs searches based on similarity. Unlike traditional keyword-based search, vector search can capture semantic similarities, which is one of its key features.
In this project, I implemented a related product recommendation feature that utilizes natural language input.
By Embedding product descriptions and calculating the similarity between user queries in natural language and these vectors, it becomes possible to recommend products that match the user’s intent, based on the product data.
For more detailed information, the session from Google Cloud Next Tokyo ’23 was highly informative. As this session is targeted towards Japanese engineers, the presentation video is in Japanese.
https://www.youtube.com/watch?v=7XI45ll8fqQ
Demo
I developed a web application that accepts user input and searches for 5 related products to display.
I created a local tool to prepare product data and register it in Firestore. The product search web application was implemented using Remix and then deployed to Cloud Run.
Furthermore, I sought assistance from Claude3 in proposing the frontend design, which resulted in a user-friendly interface.
Architecture
What I Did
1. Creating product descriptions for vectorization
2. Embedding the product descriptions
3. Registering data to Firestore
4. Creating indexes for vector search
5. Performing vector search using natural language
I implemented the vector search in Firestore based on the official documentation available [here](https://firebase.google.com/docs/firestore/vector-search).
During the process, I realized that Firestore’s vector search only supports Node.js and Python (as of 2024/04/25). Although I initially prepared the product data (steps 1 and 2) using Go, i eventually wrote the Firestore registration and vector search processing in Node.js (TypeScript).
1. Creating product descriptions for vectorization
Since I didn’t have real product data on hand, I used a fake dataset for e-commerce sites available on BigQuery. [This dataset](https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/datasets/thelook_ecommerce/pipelines/_images/run_thelook_kub/fake.py), sourced from GoogleCloudPlatform/public-datasets-pipelines, contains fictional product data generated for demonstration purposes.
I exported the data to CSV and created demo data for local registration to Firestore.
This dataset includes product names and costs, but it doesn’t contain meaningful data that can be used to find similarities with user input in natural language. In vector search, capturing the semantic similarity between search queries and product descriptions is crucial. Therefore, product descriptions that express the features and use cases of the products are essential. To address this, I decided to generate “pseudo” product descriptions from the product names for each item.
These “pseudo” product descriptions were generated using the Gemini Pro 1.0 model.
goCopy codefunc generateProductInfo(ctx context.Context, products string) (string, error) {
prompt := fmt.Sprintf(`
You are an AI assistant that generates dummy data for an online shop. The product data is as follows:
'''csv
id,cost,category,name,brand,retail_price,department,sku,distribution_center_id
%s
'''
Please create a product description for each item in a single line of about 100-150 characters, considering the product's features, materials, season, target audience, size, etc.`, products)
resp, _ := model.GenerateContent(ctx, genai.Text(prompt))
content := resp.Candidates[0].Content.Parts[0].(genai.Text)
return string(content), nil
}
2. Embedding the product descriptions
I embedded the generated “pseudo” descriptions using the embedding-001 model.
The Embedding models provided by Google are listed [here](https://ai.google.dev/gemini-api/docs/models/gemini?hl=en#embedding).
func generateEmbbedingValue(ctx context.Context, text string) ([]float32, error) {
vertexAIClient, _ := genai.NewClient(ctx, option.WithAPIKey(os.Getenv("API_KEY")))
embedder := vertexAIClient.EmbeddingModel("embedding-001") // Embedding のモデルを指定
resp, _ := embedder.EmbedContent(ctx, genai.Text(text))
return resp.Embedding.Values, nil
}
3. Registering data to Firestore
I used the Firestore SDK to register the data.
import { Firestore, FieldValue } from "@google-cloud/firestore";
import fs from "fs";
const db = new Firestore({ projectId: process.env.PJ_ID, databaseId: process.env.DB_ID,});
const addDocuments = async () => {
// Read product data from CSV file, which includes generated vectors
const records = parseToJson(fs.readFileSync("products.csv"));
for (const record of records) {
// Parse and format vector data from string to array
const vector = parseVector(rowVector);
// Create a new document object with product data and vector field
const doc = { ...record, embedding_field: FieldValue.vector(vector) };
await db.collection(process.env.COLLECTION_NAME).add(doc);
}
};
4. Creating indexes for vector search
To enable vector search, I created a composite index.
gcloud alpha firestore indexes composite create \
--collection-group={collection-group} \
--query-scope=COLLECTION \
--field-config field-path=field,vector-config='{"dimension":"768", "flat": "{}"}' \
--database={database-id}
5. Performing vector search using natural language
The user’s input text is directly vectorized and used as input for the vector search query.
import { Firestore, FieldValue, VectorQuery, VectorQuerySnapshot } from "@google-cloud/firestore";
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI(process.env.API_KEY ?? "");
const model = genAI.getGenerativeModel({model: "embedding-001"});
const searchDoc = async (message: string) => {
const result = await model.embedContent(message);
const vectorQuery: VectorQuery = db.collection("products").findNearest(
"embedding_field",
FieldValue.vector(result.embedding.values),
{ limit: 5, distanceMeasure: "COSINE" }
);
const vectorQuerySnapshot: VectorQuerySnapshot = await vectorQuery.get();
const res: any[] = [];
vectorQuerySnapshot.forEach((doc) => res.push(doc.data()));
return res;
};
In Firestore’s vector search, you can choose from Euclidean distance, cosine similarity, or dot product as the method for calculating similarity between vectors.
Summary
I used the Gemini Pro 1.0 model to generate dummy product descriptions, which were used as the target for embedding.
The generated descriptions were vectorized using the Embedding model for text.
The obtained vector values and product data were registered in Firestore.
Using the same model as the one used for vectorizing the product data, I vectorized the user’s natural language input and implemented vector search in Firestore.
By combining the Gemini and Firestore vector search capabilities, I were able to implement a demo of a related product recommendation feature using natural language.
With the availability of vector search in managed databases like Firestore, it has become easier to incorporate vector search into existing applications.
Although there are many tasks involved in improving the accuracy of vector search, such as organizing the data to be vectorized, examining the Embedding model, and adjusting the input values during vector search, taking the first step in vector search has become more accessible.
However, when using Firestore’s vector search, it is necessary to create composite indexes.
In Firestore’s pricing model, the cost of read operations using composite indexes is calculated based on the number of documents included in the index. For regular indexes, the cost is calculated as 1 read operation per 1,000 documents, but for vector search, it is calculated as 1 read operation per 100 documents.
In other words, vector search generates 10 times more read operations compared to regular indexes. Therefore, when dealing with large-scale datasets, the impact of index size on search costs becomes significant.
It seems that appropriate index design is required according to the requirements of the application.
Reference
- YouTube Video: https://www.youtube.com/watch?v=7XI45ll8fqQ (Google Cloud Next Tokyo ’23 session about vector search)
- Firestore Vector Search Documentation: https://firebase.google.com/docs/firestore/vector-search
- Google’s Embedding Models: https://ai.google.dev/gemini-api/docs/models/gemini?hl=en#embedding
- Dataset: https://github.com/GoogleCloudPlatform/public-datasets-pipelines/blob/main/datasets/thelook_ecommerce/pipelines/_images/run_thelook_kub/fake.py