Developing a Product Chatbot with AirMiles in Snowflake

An LLM-powered chatbot with RAG using all open-source technology in Snowpark Container Services

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

5 min readFeb 26, 2024

UPDATE: Since publishing this article, Snowflake has released Cortex Search, vector embeddings support, and Cortex LLM Functions, which provides an easier and more cost effective way to construct LLM chatbots with RAG.

Introduction

Prior to the ubiquitousness of Large Language Models (LLMs), personalization of outgoing content and messaging to consumers was a common north star for enterprises. With the democratization of LLM-powered chatbots, consumers can now engage back with enterprises in a way that feels personal and natural at an unprecedented scale. Fueled by robust search mechanisms on proprietary data, these chatbots presents a huge opportunity for enterprise marketing. In this article, we will walk you through the architectural decisions and designs considered in AIR MILES’ journey to capitalize on this opportunity by developing a loyalty rewards chatbot in Snowflake’s Snowpark Container Services (SPCS)*. A fully-reproducible implementation with your product data in Snowflake can be found here.

AIR MILES

The AIR MILES Reward Program, one of Canada’s most recognized loyalty program, serves nearly 10 million active collector accounts, representing more than half of all Canadian households. Collectors earn Reward Miles at more than 300 leading Canadian, global, and online brands and at thousands of retail and service locations across Canada. Collectors have the flexibility to use Reward Miles on merchandise, travel, events, attractions, everyday essentials, and more in-store or online.

With the advent of LLMs and Generative AI, AIR MILES sought to create an alternative interface for internal marketers and collectors to browse rewards. AIR MILES hoped LLMs could enable end users to find prospective offers in a natural, conversational fashion.

“GenAI and LLM have come a long way and are now more readily available for companies to revolutionize their customer engagement and increase operational efficiencies. But this technology still requires a big learning curve and a lot of industry specific skillsets. With the help of Snowflake, we were able to remove this gap and expedite our ability to deliver a new product in this space.”
Emmanuel Menard
Director Data Science & Data Engineering
AIR MILES® Reward Program

Why Snowpark Container Services?

Today’s LLM-powered solutions demand graphics processing unit (GPU) compute resources for scalable applications. There are a number of vendors whom provide LLM services on GPUs. However, AIR MILES offer metadata are securely managed in the Snowflake Data Cloud, and it was paramount to ensure the security perimeter was maintained and no data were extracted. With ready-to-use Nvidia GPUs, Snowpark Container Services (SPCS) offered the ideal infrastructure for AIR MILES’ secure chatbot experience.

Snowpark Container Services (SPCS) is a fully managed container offering designed to facilitate the deployment, management, and scaling of containerized applications within the Snowflake ecosystem. SPCS enabled AIR MILES to deploy all the underlying ingredients (data retrieval, LLM, front-end chat, etc.) of the chatbot chain entirely in Snowflake as manageable services. In addition, each service is outfitted with an appropriately sized amount of compute, ranging from basic CPU to medium-sized GPU.

RAG Architecture

In crafting the ideal chat experience, offer authenticity became top priority. An architecture using Retrieval Augmented Generation (RAG) was constructed to mitigate LLM hallucinations by ensuring reported offers were accurate and relevant to the given conversation. RAG is the process by which a user’s question and conversation history are used to search and retrieve relevant data to be passed to the LLM as “additional context”. Any time a question is posed to the AIR MILES chatbot, the RAG architecture searches offer metadata in Snowflake, selects 10 relevant offers, and passes those offers to the LLM to interpret and incorporate into the conversation (if appropriate). LLM’s permitted prompt lengths are not infinite, so the dynamic filtering executed by RAG ensures prompts do not exceed the LLM’s context window length.

Natural Language Offer Descriptions

For the RAG architecture to determine which offers are most relevant, it primarily relies on embeddings, which are long arrays (or vectors) of numbers that capture the semantic meaning and relationships of text. For AIR MILES, the text were natural language descriptions generated from tabular offer metadata in Snowflake. The embeddings were generated by passing these descriptions through a large embedding model running on a GPU-powered Snowpark container service. The resulting embeddings were imported into an open-source vector database (specifically, Weaviate), which was running as a separate but connected Snowpark container service.

Open-Source LLM Selection

The ideal LLM for AIR MILES would balance the need to understand natural language (a given for LLMs), perform moderate reasoning, and hold a conversation all with the inference time of typical end user applications. The LMSYS Chatbot Arena Leaderboard was closely monitored to identify the top-performing open-source LLMs in chat.

VLLM was used to serve the LLM with an API endpoint in SPCS to provide very fast LLM inferencing and compatibility with downstream chat mechanisms like langchain and openai Python packages. The balancing act was selecting the smallest VLLM-compatible model that could support the comprehension needs of the use case, which at the time of this development, was openchat/openchat-3.5–1210.

Chat Interface

Finally, the RAG architecture and VLLM API were brought together in an LLM chain mechanism provided by langchain that mirrors that of engaging with openai’s API. The chain incorporates tooling to retrieve relevant offers from Weaviate using our RAG architecture, which combines primarily vector-based semantic search with sparse keyword search. The hybrid search performed well in balancing very specific searches, such as those for a certain product or company, with more ambiguous exploration, such as searches for product categories.

LLM chain retrieves relevant offers upon receiving a prompt. Offers are passed as context to the LLM with conversation history and question for the LLM to generate a response.

Perhaps most critical are the explicit instructions, referred to as the system prompt, given to the LLM. These instructions define how the LLM should behave including what tools to use, required formatting, and additional guidance. The AIR MILES LLM was given explicit instructions to not fabricate content, adhere to a very specific format when listing offers, and ensure any offers reported included links, qualification requirements, and reward details.

Current end users can engage with the chatbot through a Streamlit chat interface accessible with a public-facing URL and served from SPCS. All user input and corresponding chatbot responses are logged to a Snowflake table to enable additional analysis.

What’s Next?

AIR MILES plans to iterate on the current build and conduct further evaluations with internal marketers. With the entire solution in SPCS, AIR MILES intends to incorporate additional elements of personalization maintained in Snowflake, such as their existing reward recommendation engines. Their long term vision is to shape multiple chat-based use cases that can complement the browsing experience throughout AIR MILES digital offerings.

“Although we are still in the experimental phase, we feel that once we launch this product it will set us apart from our competitors.”
Emmanuel Menard
Director Data Science & Data Engineering
AIR MILES® Reward Program
* Snowpark Container Services (SPCS) is now in Public Preview and available for customers to use.