Pelago’s Travel Assistant — Part II

Published in

Pelago Tech Blog

12 min readJan 2, 2024

Architecture of the AI-powered chatbot — By Aman Srivastava

Pelago, the tour and activities booking platform backed by Singapore Airlines provides a wide array of experiences, including theme parks, museums, guided tours, transportation, and travel essentials across 1000+ destinations.

In the last couple of years, our platform witnessed a remarkable surge in the number of users looking to book travel activities. This unprecedented growth posed significant challenges for our customer support team. They received a surge of inquiries ranging from product-related queries to payment and refund concerns, all presented in different languages. Effectively comprehending and promptly responding to this became a major challenge.

It’s crucial to note that many of the answers to these queries are already available on the platform. Despite the customer support team’s best efforts, it was infeasible given the volume and diversity of inquiries. To tackle this challenge, we introduced our intelligent AI travel assistant, looking to provide accurate and swift responses to customer queries across multiple languages. This would both improve customer experience resulting in increased conversion and alleviate the burden on our support team.

This blog is a continuation of our earlier blog post detailing our journey of building this AI travel assistant. We’ve also spoken about the vision, team dynamics, high-level system design, and the challenges encountered from prototype to production. In this blog, we’ll take a deep dive into the core architecture of this AI travel assistant i.e. RAG (Retrieval Augmented Generation). You’ll gain insights into its key pillars, explore different use cases, and gain valuable insights.

Table of Contents

1. Understanding RAG & Its benefits
2. Chatbot Architecture at Pelago: The Four Pillars
3. PelagoKnowledge: Data behind the Scene
4. Smart Rephrase: Converging conversation to single query
5. Intent Dectection: Use cases handled
6. Prompt Library: Custom prompt for each use case
7. Evaluation: How to evaluate your chatbot?
8. FAQs: Curious Queries from Internal and External Teams
9. Conclusion & Future directions

Understanding RAG & its benefits

Retrieval Augmented Generation (RAG) is a powerful approach in the realm of generative AI that significantly enhances the quality of model responses by addressing the limitations of most Large Language Models (LLMs) which are trained on data until September 2021. By integrating an external data store during inference, RAG creates comprehensive prompts with context, historical info, and recent knowledge. The pipeline utilizes textual documents, an embedding model, a vector database, and a prompt with an LLM, transforming LLMs into domain experts, enhancing accuracy, and generating contextually relevant outputs.

Benefits of using RAGs with LLMs

Eliminates Fine-Tuning: Avoid resource-intensive fine-tuning by relying on a pre-existing knowledge base for quick deployment, saving costs, and providing precise control over responses.
Adaptable to Changes: Handle real-time updates and new data efficiently, which is crucial for dynamic industries like travel, ensuring the chatbot remains current and accurate without requiring model retraining.
Guarding Against Hallucinations: Prevent hallucinated responses by selectively using retrieved data, ensuring accurate and contextually relevant chatbot responses for an enhanced user experience.

Chatbot RAG Architecture: The Four Pillars

Pelago’s chatbot is powered by the robust RAG architecture, leveraging LLMs guided by the information available on the platform. It comprises four fundamental pillars, each plays a crucial role in ensuring users receive precise and timely assistance. These pillars — Pelago Knowledge, Smart Rephrase, Intent Detection, and Prompt Library — collaborate to deliver the best customer service experience. Pelago Knowledge serves as a digital encyclopedia, constantly updated with the latest information. Smart Rephrase acts as a language wizard, making user query context-aware. Intent Detection decodes users’ intent from query, and also functions as a crucial layer preventing hallucination, and prompt injection, and ensures the chatbot stays within the travel domain. The Prompt Library provides the chatbot with tailored templates for accurate and domain-specific responses. Together, these pillars form the backbone of a chatbot that is not only helpful but also intuitive in understanding and addressing user needs, while maintaining the integrity and focus within the travel domain.

Let’s dive deeper into each of these components to understand how they contribute to the chatbot’s efficacy.

1st Pillar: Pelago Knowledge

Pelago Knowledge serves as the foundation of our chatbot’s intelligence, containing essential information to guide the final response. It consolidates various documents from diverse sources, including PDF files and databases, covering product meta-information such as name, description, pricing, and package details. Additionally, it includes factual knowledge about the platform and includes general inquiries from the Pelago FAQs page. Furthermore, to enrich our data sources, we incorporate information from customer support tickets from Zendesk, reducing the repetition of similar user queries.

After collecting all the necessary data from sources, our next step involves utilizing our distinctive method known as Smart Chunking. This method is a precise approach to breaking down extensive documents into smaller, more manageable pieces, often referred to as “chunks.” The objective is to carefully craft each of these smaller pieces to contain sufficient information capable of standing alone and providing comprehensive responses to user queries. Additionally, each chunk is enriched with metadata, aiding retrieval in identifying the correct information from the abundance of data.

The next step in the process is to transform these textual chunks into numerical vectors, capturing the content’s semantics by leveraging the sentence embedding model. These vectors are stored in a dedicated database, forming a robust foundation for real-time retrieval. This vector-based approach boosts the speed and accuracy of our chatbot’s responses, delivering precise and contextually relevant information from Pelago Knowledge during interactions.

Lastly, to ensure the freshness of our information, we utilize Airflow, a workflow engine that streamlines the process of updating our documents daily in the vector database. This proves crucial in the dynamic travel domain where seasonal changes impact content, prices, and features.

2nd Pillar: Smart Rephrase

Pelago’s Smart Rephrase acts like a language expert for our chatbot. It takes the user’s question and makes it clearer by considering the entire chat history. It creates a new question that contains all the necessary details, even replacing pronouns with specific names or entities mentioned earlier. This new, clear question can then be used for figuring out what the user wants (intent detection) and searching for the right information (vector search).

Beyond mere comprehension, Smart Rephrase structures information by identifying key elements like destination names, country references, or category tags such as “adventure” or “kid-friendly” from rephrased queries. This structured approach significantly enhances the efficiency of document retrieval during vector search, ensuring that the chatbot can accurately retrieve relevant information.

Importantly, Smart Rephrase understands when the context changes in the conversation and avoids unnecessary changes in the message. It also expands short forms, like turning “USS” into “Universal Studio Singapore,” to help find information more efficiently.

Here is a sample prompt guide for SmartRephrase.


PROMPT = """
Rephrase the upcoming user message delimited by <ms></ms> below 
to create a standalone query that captures the entire 
context effectively, considering the provided guidelines. 
The rephrasing should only be done if the context remains the same, 
replacing references with actual nouns and expanding short-form abbreviations. 
The chat history within <cs></cs> tags provides the context.

GUIDELINES:
1. Do NOT rephrase if the context has switched.
2. Replace references with actual nouns.
3. Expand short-form abbreviations.

CHAT HISTORY:
<cs>{chat_history}</cs>

USER MESSAGE:
<ms>{message}</ms>

OUTPUT REQUIREMENT:
Provide the rephrased standalone user message based on the context and guidelines.
"""

Additionally, you can also include examples as few-shot learnings within the provided prompt, this inclusion will serve as a guide for the Language Model to better understand the guidelines and the expected format for generating the response.

3rd Pillar: Intent Detection

Intent detection is the pivotal third pillar in Pelago’s chatbot architecture, functioning as a comprehensive classifier that identifies user queries’ underlying intent. This sophisticated layer plays a crucial role in shaping the subsequent stages of the chatbot’s response generation process.

Pelago’s Intent Detection model is designed to categorize context-aware rephrased queries from smart rephrase into 20+ distinct intent categories. These categories span a wide spectrum, some examples of intent/use cases this layer caters to are:

Product Inquiry: Queries related to specific product details.
Helpdesk FAQs: Concerns about payments, refunds, or other assistance-related issues.
Promotions and Offers: User inquiries about available promotions and offers.
Travel recommendation: Queries seeking travel advice or recommendations to plan an itinerary for a specific destination.
Prompt Injection Guardrail: Detection of attempts to manipulate or inject prompts.

Some major elements of this layer are

Hierarchical Intent Detection: Recognizing the dynamic nature of user queries, Pelago’s system adopts a hierarchical approach, identifying broader labels before delving into subcategories, enhancing intent recognition accuracy.

Guardrail for Prompt Injection: A protective barrier ensures the chatbot responds only to valid queries within the travel domain, actively denying responses to out-of-scope requests.

Example-driven Intent Detection: To enhance accuracy, each intent category includes illustrative examples, providing explicit guidance for faster and more precise categorization using few-shot learning.

Here is the sample template

INTENT_DETECTION_SAMPLE_PROMPT = """
As an AI-driven assistant, categorize the message delimited by <ms></ms> into 
intents from the provided categories:

Intent Categories:

1. [Intent A]: <description> 
   <example 1>
   <example 2>
      
2. [Intent B]: <description> 
   <example 1>
   <example 2>

3. [Intent C]: <description> 
   <example 1>
   <example 2>

4. [Intent D]: <description> 
   <example 1>
   <example 2>

5. [Intent E]: <description> 
   <example 1>
   <example 2>

Message: <ms>{message}</ms>
Output Requirement: Provide the detected intent name only.
"""

4th Pillar: The Prompt Library

The Prompt Library, positioned as the 4th Pillar within Pelago’s chatbot framework, strategically confronts the intricate challenges posed by the diverse landscape of user queries in the travel domain. Recognizing the impracticality of relying on a one-size-fits-all prompt, especially within the constraints of token-level restrictions in Large Language Models (LLMs), the Prompt Library emerges as a pivotal solution. It acknowledges the complexity of addressing numerous potential intents with a single prompt, realizing the associated risks of confusion and limitations. The library stands out as a dynamic and adaptive solution, providing clarity in navigating the diverse spectrum of user inquiries while working within token restrictions.

Picture the Prompt Library as a tailored set of rules, ensuring the chatbot precisely understands how to navigate the intricacies of responding to distinct inquiries, whether it’s a product-related question or a payment or refund-related concern.

Building Blocks of the Prompt Library

Custom Prompt: Each intent comes with its set of prompt instructions, directing the LLM to use respective templates in crafting a response.
Step-by-Step Guide: Breaking down complex tasks into simple steps aids the LLM in focusing on one straightforward instruction at a time, leading to improved responses.
Few shot Learnings: The prompt additionally provides some examples to assist LLM in learning and comprehending expectations.
Use What You Know: The Prompt guides the LLM on how to use information from the received context, and what to do if it doesn’t find anything relevant.

The picture below outlines the response generator LLM layer’s structure, using the custom prompt from the Prompt Library for detected intent and relevant information from Pelago Knowledge.

Together, these components work seamlessly to form fast and accurate responses, with the set of prompts playing a crucial role in obtaining the correct information. These prompts are Pelago’s intellectual property, and we are consistently working to enhance them based on analytics and user feedback.

Evaluation: How to evaluate the Pipeline?

At the core of our commitment to precision is a robust evaluation process powered by an inbuilt automation tool, orchestrated by Large Language Models (LLMs). This systematic approach involves testing over 1k+ user queries, spanning diverse use cases, including both relevant and unrelated inquiries. This comprehensive dataset allows us to assess the system’s discernment and accuracy across various scenarios.

Our automation tool meticulously runs the entire query-response pipeline, comparing each generated response to its ground truth value. Facilitated through a purpose-built prompt measuring dissimilarity, this evaluation ensures close alignment with expected results. To fortify consistency, we maintain a standardized context from Pelago’s knowledge base, mitigating fluctuations in responses due to different embedding results. Importantly, we routinely use this evaluation tool for any minor prompt tuning or code changes. Before release, the Quality Assurance (QE) team approves, ensuring stable performance and persistent accuracy across diverse user queries.

Stay tuned for more insights on this evaluation process in our upcoming blog series!

FAQs: Curious Queries from Internal and External Teams

As we developed Pelago’s AI-driven chatbot, we received lots of questions — from our team and external partners curious about the details. Here, we share not only the answers to those FAQs but also the valuable lessons we learned along the way.

Q1. What LLM model powers your chatbot, and have you conducted evaluations on different LLMs?

We are currently utilizing the openai gpt-3.5-turbo model for our chatbot. Indeed, we conducted evaluations on various models, including open-source ones like LLama2, by testing them with our queries. Ultimately, we found that the gpt-3.5-turbo model performed optimally for our needs. Importantly, it also eliminates the additional burden of deploying and maintaining infrastructure, streamlining our operational efficiency.

Q2. Why did you choose gpt-3.5 over gpt-4 for your chatbot?

A2. We chose gpt-3.5 over gpt-4 primarily due to cost considerations. While gpt-4 demonstrated better performance, the associated cost was approximately 20 times higher. Given that gpt-3.5 meets our performance requirements satisfactorily, we made a strategic decision to prioritize cost-effectiveness in our implementation, ensuring an optimal balance between performance and expenditure.

Q3. Which embedding models do you use, and why did you choose them? Also, what about your choice of vector database?

A3. We utilize the sentence-transformer model for embedding. Although we evaluated several alternatives, including open ai’s text-embedding-ada-002 and other open-source models, the sentence-transformer model performed sufficiently well. While it may not be the absolute best, its performance meets our requirements, and it offers cost savings as we host it on our internal server. For our vector database, we've adopted Elasticsearch. Our decision was influenced by the existing infrastructure for Elasticsearch, and it provides a good balance of performance and speed, allowing us to implement quickly and efficiently.

Q4. What was the most impactful strategy that made your chatbot smart and efficient in deployment?

A4. While there isn’t a single strategy, but can say smart chunking plays a crucial role. Initially, we used a prebuilt tokenizer with a token limit to split documents, but we encountered issues with incomplete information in some documents being treated as context, leading to incorrect results and hallucinations. Adjusting our approach with smart chunking resolved this challenge effectively. Additionally, incorporating metadata alongside documents in our vector database proved highly beneficial. This not only enabled semantic search but also facilitated document retrieval based on user query entities, allowing us to apply filters and enhance the accuracy of information retrieval.

Q5. Based on your experience, what is your recipe for making prompts more effective and impactful?

A5. In our view, effective prompting is crucial for the success of any project using Large Language Models (LLMs). From our experience, breaking down tasks into a series of steps accompanied by relevant examples significantly aids LLMs in better understanding and following instructions. This approach proves more effective than presenting tasks in a paragraph format with task descriptions alone. The inclusion of few shot learning has proven to be particularly beneficial, enhancing the LLM’s ability to comprehend and respond accurately based on our observations.

Q6. Does your bot experience hallucinations, and how do you address them?

Yes, sometimes but we actively address this through regular monitoring using our analytics dashboard and fine-tune prompts as needed. Given our focus in the travel domain, we have some flexibility in accommodating minor hallucinations. However, in critical domains like healthcare, where even slight inaccuracies can have significant consequences, building an exceptionally robust system is imperative. We recommend adding an additional layer with Large Language Model (LLM) to verify the correctness of the generated response against the user query and the retrieved documents as an added precaution for accuracy and reliability.

We hope this FAQs section provides valuable insights for you. Feel free to drop more questions in the comments, and we look forward to assisting you further!

Conclusion and Future Directions:

In conclusion, Pelago’s AI-driven chatbot journey, anchored by the RAG architecture, has led to continuous improvement and enhanced efficiency. RAG has addressed post-COVID challenges, offering benefits like eliminating fine-tuning, real-time adaptability, and preventing hallucinated responses. Our chatbot understands diverse queries and provides contextually relevant responses in multiple languages.

As we advance, Pelago is committed to refining and expanding our chatbot capabilities. Ongoing evaluations, strategic enhancements, and user feedback are vital for maintaining accuracy and reliability. We invite you to leave your comments and suggestions, contributing to the ongoing evolution of our AI travel assistant. Stay tuned for more insights and upcoming improvements in our blog series.

Thank You!