Building a Smart Retail Shopping Assistant PART 1

Abirami Sukumaran

Published in

Google Cloud - Community

10 min readJun 3, 2024

A Deep Dive into Knowledge-Driven Conversational AI

Introduction

In today’s fast-paced retail landscape, delivering exceptional customer service while enabling personalized shopping experiences is paramount. We’ll take you on a technical journey through the creation of a knowledge-driven chat application designed to answer customer questions, guide product discovery, and tailor search results. This innovative solution combines the power of AlloyDB for data storage, an in-house analytics engine for contextual understanding, Gemini (Large Language Model) for relevance validation, and Google’s Agent Builder for quickly bootstrapping an intelligent conversational assistant.

The Big Picture — A Knowledge-Driven Approach

The Challenge: Modern retail customers expect instant answers and product recommendations that align with their unique preferences. Traditional search methods often fall short of providing this level of personalization.

The Solution: Our knowledge-driven chat application tackles this challenge head-on. It leverages a rich knowledge base derived from your retail data to understand customer intent, respond intelligently, and deliver hyper-relevant results.

Core Components:

AlloyDB: This fully managed PostgreSQL-compatible database serves as our primary data store. It’s optimized for transactional workloads and offers the scalability and performance needed to handle our retail data efficiently.
Analytics Engine: This custom-built engine performs the heavy lifting of transforming raw data into actionable insights. It extracts context, creates embeddings, and drives the search for relevant information.
Generative AI Model: The generative AI model that we will use as a quality control filter for the match results. It evaluates the potential responses generated by our analytics engine, ensuring that they align with the user’s intent and filtering out any irrelevant or potentially harmful content.
Agent Builder: This tool simplifies the process of creating conversational agents. We integrate our analytics engine as an API source, enabling natural language interactions between customers and our intelligent assistant.

Architecture

Data Flow: Let’s take a closer look at how data moves through our system:

Ingestion: Retail data (inventory, product descriptions, customer interactions) is continuously loaded into AlloyDB.
Analytics Engine Processing:
We will use AlloyDB as the analytics engine to perform the below:

Context Extraction: The engine analyzes the data stored within AlloyDB to understand relationships between products, categories, customer behavior, etc as applicable.
Embedding Creation: Embeddings (mathematical representations of text) are generated for both the user’s query and the information stored in AlloyDB.
Vector Search: The engine performs a similarity search, comparing the query embedding to the embeddings of product descriptions, reviews, and other relevant data. This identifies the 25 most relevant “nearest neighbors.”

3. Gemini Validation: These potential responses are sent to Gemini for assessment. Gemini determines if they are truly relevant and safe to share with the user.

4. Response Generation: The validated responses are structured into a JSON array and the whole engine is packaged into a serverless Cloud Function that is invoked from the Agent Builder.

5. Conversational Interaction: Agent Builder presents the responses to the user in a natural language format, facilitating a back-and-forth dialogue. This part will be covered in a follow-up blog.

Deep Dive — Under the Hood of the Shopping Assistant

Ingest Data / Data Storage:

For this demo, we will use publicly available retail information to create the context and knowledge required for the shopping assistant. AlloyDB offers a number of features that make it a good choice for demanding transactional and analytical workloads, including high performance, availability, scale, security and compliance.

Pgvector Extension:

The pgvector extension adds powerful vector manipulation capabilities to AlloyDB, making it ideal for storing and searching text embeddings.

Embeddings:

We convert the context and retail knowledge and user search texts into numerical representations called embeddings. This enables us to perform semantic search — finding relevant matches and responses based on meaning and context rather than just keyword overlap.

Similarity Metrics:

Real-time Matching with Cosine Similarity: Cosine similarity, a mathematical way to measure how similar two vectors (and thus, two pieces of text) are, is used to rank search results by relevance. When a user enters a search query, we convert it to embeddings and then calculate the cosine similarity between these embeddings and the stored retail data embeddings. This quickly identifies the most relevant responses, even if the user’s phrasing doesn’t perfectly match the stored text.

Cloud Function Deployment:

The entire analytics engine step is wrapped up in a Cloud Function, a serverless solution that scales effortlessly and keeps costs down. It is a lightweight compute solution for developers to create single-purpose, stand-alone functions that respond to Cloud events without needing to manage a server or runtime environment.

Scalability Considerations:

Ensuring our knowledge-driven chat application remains responsive and efficient under varying loads is crucial. We’ve leveraged the scalability of Google Cloud Functions to achieve this:

Cloud Functions Auto-Scaling: Cloud Functions automatically scales the number of instances based on incoming traffic. This means we can handle spikes in user requests without manual intervention, ensuring consistently fast response times.

Vertical Scaling: AlloyDB allows us to easily adjust CPU and memory resources to match the demands of our application. This ensures that the database can handle increased query volumes from the Cloud Functions instances.

Hands-on

Setup

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
You will use Cloud Shell, a command-line environment running in Google Cloud. From the Cloud Console, click Activate Cloud Shell on the top right corner.
Enable necessary APIs for this implementation if you haven’t already: To begin with, enable the APIs for AlloyDB, Vertex AI, Cloud Functions, Compute and Cloud Run:

gcloud services enable alloydb.googleapis.com,functions.googleapis.com,run.googleapis.com,compute.googleapis.com

If any API is missed, you can always enable it during the course of the implementation.

Schema and Data Ingestion

Create a cluster and instance with cluster id “patent-cluster”, password “alloydb”, PostgreSQL 15 compatible and the region as “us-central1”, networking set to “default”. Set instance id to “patent-instance”. Click CREATE CLUSTER.
You can create a table using the DDL statement below in the AlloyDB Studio:

CREATE TABLE apparels ( id BIGINT, category VARCHAR(100), sub_category VARCHAR(50), uri VARCHAR(200), image VARCHAR(100), content VARCHAR(2000), pdt_desc VARCHAR(5000), embedding vector(768) );

3. Now that the schema is defined, let’s go ahead and ingest data into the table:

Copy the INSERT DML scripts from this repo and run it from AlloyDB Studio Query Editor. This should insert 2906 records. If you want to test it with limited records, feel free to cut it down.

4. Enable Extensions:

For building this app, we will use the extensions pgvector and google_ml_integration. The pgvector extension allows you to store and search vector embeddings. The google_ml_integration extension provides functions you use to access Vertex AI prediction endpoints to get predictions in SQL. Enable these extensions by running the following DDLs:

CREATE EXTENSION vector;
CREATE EXTENSION google_ml_integration;

5. Grant Permission:

Run the below statement to grant execute on the “embedding” function:

GRANT EXECUTE ON FUNCTION embedding TO postgres;

Context

We will now update the pdt_desc field with context data:

update apparels set pdt_desc = concat('This product category is: ', category, ' and sub_category is: ', sub_category, '. The description of the product is as follows: ', content, '. The product image is stored at: ', uri) where id is not null;

This DML creates a simple context summary using the information from all the fields available in the table and other dependencies (if any in your use case). For a more precise assortment of information and context creation, feel free to engineer the data in any way that you find meaningful for your business.

Embeddings

Now that the data and context are ready, we will run the below DML to convert the knowledge base into embeddings for the conversation assistance:

UPDATE apparels set embedding = embedding( 'textembedding-gecko@003', pdt_desc) where id > 2500 and id <= 3000;

We have used textembedding-gecko@003 from Vertex AI to convert the data into embeddings. If you have a custom embedding model, feel free to use that instead. Refer this to register and call remote AI models in AlloyDB.

Just take a look at the Embeddings stored in AlloyDB after this update:

SQL Editor Image: Query and Result after Embeddings Update

Vector Search with Cosine Similarity

Now that the table, data, embeddings are all ready, let’s perform the real time Vector Search for the user search text.

Let’s say I am the user and I want to ask this:

“I want women’s footwear, black color, with heels and strictly no leather.”

You can find matches forthis by running the query below:

SELECT id, category, sub_category, content, pdt_desc as description FROM apparels ORDER BY embedding <=> embedding(‘textembedding-gecko@003’, ‘I want womens footwear, black color, with heels and strictly no leather.’)::vector LIMIT 25;

In this query,

The user search text is: “I want women’s footwear, black color, with heels and strictly no leather.”.
We are converting it to embeddings in the embedding() method using the model: textembedding-gecko@003.
“<=>” represents the use of the COSINE SIMILARITY distance method.
We are converting the embedding method’s result to vector type to make it compatible with the vectors stored in the database.
LIMIT 25 represents that we want to extract 25 nearest neighbors for the search text.

Result looks like this:

Cosine Similarity Vector Search Query Result Screenshot

Match Validation with LLM

Let’s use our generative AI model to validate if these potential responses are truly relevant and safe to share with the user. Since this is a demo app and I am using a small denomination trial billing account, I have used TEXT-BISON-32K. Feel free to use Gemini 1.0 Pro or other models for your production use case.

I am going to come up with the SQL query first and then break it down for explanation:

select id, description, LLM_RESPONSE from (
SELECT id, user_text, description, category, sub_category, uri,
ML_PREDICT_ROW(
    'projects/abis-345004/locations/us-central1/publishers/google/models/text-bison',
    json_build_object(
        'instances', json_build_array(
            json_build_object(
                'prompt', 'Read this user search text: ' || user_text || ' Compare it against the product record: ' || content || ' Return a response with 3 values: 1) MATCH: if the 2 products are at least 85% matching or not: YES or NO 2) PERCENTAGE: percentage of match, make sure that this percentage is accurate 3) DIFFERENCE: A clear short easy description of the difference between the 2 products. Remember if the user search text says that some attribute should not be there, and the product record has it, it should be a NO match.'
            )
        ),
        'parameters', json_build_object(
            'maxOutputTokens', 1024,
            'topK', 40,
            'topP', 0.8,
            'temperature', 0.2
        )
    )
) -> 'predictions' -> 0 -> 'content' AS LLM_RESPONSE
FROM  (
SELECT 'I want womens footwear, black color, with heels and strictly no leather.' user_text, id, category, sub_category, uri, content, pdt_desc  as description FROM apparels
ORDER BY embedding <=> embedding('textembedding-gecko@003', 'I want womens footwear, black color, with heels and strictly no leather.')::vector LIMIT 25) as xyz
) as X
where cast(LLM_RESPONSE as VARCHAR(500)) like '%MATCH:%YES%' limit 10;

First, in the innermost query I have incorporated embeddings conversion for the user’s search text and performed Vector Search to find 25 nearest matches from the retail inventory:

SELECT 'I want womens footwear, black color, with heels and strictly no leather.' user_text, id, category, sub_category, uri, content, pdt_desc  as description FROM apparels 
ORDER BY embedding <=> embedding('textembedding-gecko@003', 'I want womens footwear, black color, with heels and strictly no leather.')::vector LIMIT 25) as xyz

Next, I have invoked the Large Language Model (Text-Bison-32K) to validate the matches.

My prompt reads like this:

Compare the USER SEARCH TEXT against the product record: CONTENT and Return a response with 3 values: 1) MATCH: if the 2 products are at least 85% matching or not: YES or NO 2) PERCENTAGE: percentage of match, make sure this percentage is accurate 3) DIFFERENCE: A clear, short, easy description of the difference between the 2 products. Remember if the user search text requires that some attributes should not be present, but the product record has it, it should be a NO match.

This is followed by the maximum tokens in output, top K (tokens that you want the next token to be chosen from) and the top P (probability of occurrence that the selected token should have) parameters:

'parameters', json_build_object(
            'maxOutputTokens', 1024,
            'topK', 40,
            'topP', 0.8,
            'temperature', 0.2
        )

The -> ‘predictions’ -> 0 -> ‘content’ part extracts the “content” value from the response.

The response to the above validation looks like this:

Now, let’s build the Serverless Cloud Function to “REST” this analytics engine.

Making the AI Engine Serverless with Cloud Functions

Go to Cloud Functions in Google Cloud Console to CREATE a new Cloud Function or use the link: https://console.cloud.google.com/functions/add.
Provide Function Name “retail-engine” and choose Region as “us-central1”. Set Authentication to “Allow unauthenticated invocations” and click NEXT. Choose Java 11 as runtime and Inline Editor for the source code.
By default it would set the Entry Point to “gcfv2.HelloHttpFunction”. Replace the placeholder code in HelloWorld.java and pom.xml of your Cloud Function with the code from Java file and the XML file in the repo, respectively.
Remember to change the <<PROJECT_ID>> placeholder and the AlloyDB connection credentials with your values.
Deploy the Cloud Function and test it from the TESTING tab of your newly deployed Cloud Functions (If there are deployment errors you can see them in the LOGS tab).

Alternatively, you can test the Cloud Function from your Cloud Shell Terminal:

curl -X POST \
  YOUR_CLOUD_FUNCTIONS_ENDPOINT \
  -H 'Content-Type: application/json' \
  -d '{"search":"I want womens footwear, black color, with heels and strictly no leather"}' \
  | jq .

And the result:

Result Screenshot of the Cloud Function invoked with CURL

Building the Conversational Agent!

To be continued in the next blog. Co-authored by Romin Irani. Watch out for it!

Conclusion

Building a knowledge-driven chat application is a complex endeavor that demands a combination of data engineering, machine learning, and conversational AI expertise. By leveraging AlloyDB, our custom analytics engine and Vertex AI, we’ve created a powerful tool that enhances the retail shopping experience. Now as an assignment, why don’t you try the same engine in BigQuery instead. Remember in BigQuery you have to create the REMOTE MODEL before invoking the embeddings and generative AI models. Refer to this lab for assistance. See you in part 2 with your conversational bot.