Pelago’s Travel Assistant — Part I

Pelago Engineering

Published in

Pelago Tech Blog

11 min readDec 11, 2023

Getting AI-powered in four months — By Akshay Bhonde

Capabilities of the AI-powered travel assistant at Pelago

Introduction

Pelago is a travel experiences platform by Singapore Airlines. We sell tickets for attractions, events, guided tours, and travel essentials online enabling travellers to experience new destinations to the fullest potential.

Post-COVID, our business saw tremendous growth, which led us to expand our offerings globally. However, this growth brought challenges; our customer support team was inundated with questions about everything from Disneyland’s operating hours to the delivery of JR pass vouchers.

In December 2022, we took our first step towards self-service by developing a traditional workflow-builder chatbot. This chatbot enabled customers to download their booking vouchers and cancel bookings with free cancellation options. Shortly after, OpenAI announced API access to their GPT3 and GPT4 models, and we were instantly drawn to the potential of Large Language Models (LLMs). Our initial experiments with LLMs were a revelation. We categorized customer reviews and understood sentiments with just a few lines of code and no previous experience in machine learning. The possibilities seemed limitless, and our team embraced LLMs enthusiastically.

Despite our early successes, we knew LLMs had much more to offer. We were determined to push our boundaries and not use LLMs just for the hype. After careful consideration, we realized the most impactful application for our business was to create an LLM-powered AI assistant. This assistant wouldn’t just handle simple queries; it would be capable of addressing a wide range of customer needs, marking a significant step forward in our journey with AI and customer service.

The Vision

Reduce customer service queries by 80%
Augment the existing platform search for product discovery
Create an additional top-of-the-funnel channel aiding customers in destination discovery and trip planning
Be the go-to channel for travel inspiration

Proof of Concept — A lot of reading + experimentation

April 2023

In developing our AI assistant at Pelago, we set two primary objectives: first, the assistant should be able to interact with internal systems to accomplish tasks we call workflows, and second, answer open-ended questions using existing or augmented knowledge.

Workflows were handled quite efficiently with our existing chatbot. For example, if a user wanted to cancel a booking, the system would process it based on the details provided, and the response would be straightforward — either successful cancellation or failure with reasons. We aimed to replicate this using LLMs but in a more conversational manner.

To achieve this, we turned to the open-source community, particularly Langchain and LlamaIndex, which provided new codes and frameworks to make this possible using agents equipped with tools. Agents are an abstraction for an LLM to decide the sequence of actions. Agents can use tools to complete an action. The tools can be anything ranging from a simple Python function to a function call to another LLM to answer specific types of questions. We were looking to use tools to call internal APIs for booking cancellations and vouchers. We experimented quite a bit with Langchain to make the LLM behave (avoid hallucinating) with little luck. Our breakthrough came with OpenAI’s release of OpenAI functions, which significantly improved performance.

Regarding open-ended questions, we categorized them into two types.

Questions for which answers exist in our knowledge base and should be answered from there alone. Examples of these types of questions include “What is the price of Universal Studios Singapore child ticket?”, “How can I change the date of my activity?”, etc. This was a classic use case of the Retrieval Augmented Generation. We used SentenceTransformer to retrieve documents relevant to the question and passed them as context in the prompt.
Questions for which the LLM should answer from the general purpose knowledge it was trained on. Examples include “Can you build me an itinerary for Bali for 4 days?”, “What is the best time to visit Switzerland?”, etc. Here, OpenAI’s GPT 3.5 model proved highly capable.

Our proof of concept was swiftly completed using Gradio, an open-source Python library that facilitated the creation of UI components. We needed a chat interface for user input and conversation maintenance. After successfully testing the prototype with our employees, we were ready to take our AI assistant to the next level.

The A-Team

May 2023

Through the experience of the prototype, we realized the skills we needed for the actual work and team we needed for this project. We started with a lean team comprising a product manager, product designer, software engineers, and data scientists who would design the system end-to-end. Towards the latter part of the project, we added a quality engineer to test the system thoroughly.

Ensuring the team was completely focused on this project and clear roles assigned was key to our success. The product manager was responsible for defining the success metrics of the project. In collaboration with the product designer, they were responsible for the overall user experience. Data scientists were responsible for prompt engineering, context retrieval, and building and optimizing the information to be provided to the LLM. Our software engineers were responsible for the overall design, stitching together multiple pieces of the system while ensuring optimal performance. They were also responsible for maintaining the interaction between the system and all APIs. The quality engineer spent considerable time in manual testing and realized that testing LLM-based systems was quite different from the traditional approach of testing deterministic code outputs.

System Design

June 2023

Within a month of starting the POC, we had showcased the prototype internally and received a sign-off from the leadership. However, making a production-ready version was not going to be easy. Moreover, in the early stages of developing this technology, frequent code changes were expected. While making adjustments during the prototype phase was straightforward, it would be more challenging after production. The goal was to create a technically robust product while maintaining flexibility to accommodate future learning and adjustments.

We realized the importance of listing some design principles for this project:

Separation of concerns — Breaking the system down into components or modules, we wanted to ensure each one had a specific role. It would be simpler for individuals to get started and work independently. This would also be useful if and when the system grew in size and complexity.
Encapsulation and Abstraction — Each component (in this case the LLM) would only be given the information required to respond to the specific question rather than have access to the entire knowledge base. This resulted in more relevant responses and reduced hallucinations.
Loosely coupled, high cohesion — Modules were meant to work independently of each other. This helped individuals to focus and expedited development. However, all modules needed to focus on generating the most accurate response to the query.
Scalable — scale to numerous intents and tasks in the future without degradation in performance.

The system was broken down into three main components.

LLM layer — consisting of a hierarchical structure of system prompts for identifying the intent (question type), generating a response for the specific intent, and OpenAI function definition and RAG — a system to retrieve documents from a database and provide contextual answers. The hierarchical structure was important to ensure the scalability of the system to handle numerous tasks efficiently. For each intent type, we used a specific prompt with few shot prompting to optimize the accuracy of the response.
Knowledge Base — to enable the LLM to answer questions in the Pelago context where required. The document structure is carefully designed to ensure optimal retrieval and answer generation performance. Retrieval of information from the knowledge base is achieved through RAG. RAG is a two-step process — retrieval and contextual response generation. Retrieval of documents from the knowledge base that are most relevant to the question being asked. There are many techniques available to retrieve the relevant documents. We implemented vector-embedding-based retrieval using SentenceTransformer embeddings. Once the relevant documents are retrieved, they can be passed in the context of the LLM along with the system prompt thereby generating responses from the provided context.
Internal APIs — to handle system tasks with each function capable of handling missing and invalid parameters. Additionally, the error and success responses were set in a readable format for the LLM to understand. These APIs were accessed through OpenAI functions. Clear and concise documentation of each function is provided to LLM. The LLM uses it to identify when to call the function with the required parameters.

Platform

Aug 2023

When selecting a platform for our travel assistant, two critical factors were considered: the ability to maintain full control over conversations through our language model (LLM) and ensuring omnichannel support. Initially leaning towards an in-house solution, we ultimately opted for an external platform to avoid redundancy and maintain a consistent user experience across channels. Yellow.AI stood out as a unique platform, allowing us to host communication while directing all conversations to our language model and offering omnichannel capabilities, making it an ideal choice.

It was quite easy to set up a few workflows in the Yellow.AI platform and within a couple of weeks, our assistant was ready to test. We tested it for latency, authentication and text formatting and were quite happy with all the functionalities.

Evaluation and testing

Aug 2023

As alluded to earlier, testing this system took work. Although we followed an iterative development process, the general purpose-ability (for lack of a better word) of LLMs meant we could not isolate each module to test independently. More specifically, a slight change in the system prompt resulted in a variation in responses for other modules. This meant the overall system needed to be tested regularly before signing off on any feature. Overall, it required a lot of manual testing and some creative ideas to ensure the process was manageable.

Unit tests

We created a corpus of questions to be tested for each module. The tests were created such that we tested not the actual response from the LLM but whether

the correct process was followed to generate the response — for example, if the answer was retrieved from the knowledge base or not.
For Workflows specifically, the correct parameters were gathered and passed to the API

This was achieved through detailed logging and testing for the specific attributes in the log and the response.

Integration tests

The most important aspect of the integration tests was to test for the correct identification of the intent. The entire system needed to be tested on the addition or removal of any intent. Further, we also tested for the response time. There was a limitation of the platform which timed out after 40 seconds. A blessing in disguise, we had to ensure the response was generated within this time limit. Lastly, as an added precaution we also ensured all unit tests passed on integration.

Data and Analytics

A robust data strategy was key to ensuring a complete feedback loop for evaluation and improvements of the assistant. Broadly we wanted to capture the conversation between the user and the assistant and any feedback provided by the user.

Conversation — Each message from the user and the response needed to be stored and retrieved almost instantly at load time. We needed an in-memory store to achieve this and chose Redis. Further, the conversations needed to be backed up by a persistent store which was made possible via DynamoDB.
User feedback — We conducted surveys at the end of each conversation. The rating would be used to identify and improve our system. Moreover, we also allow users to rate each response from the chatbot which provides detailed insights. We mapped both these data points to the session and message respectively.

Further, we created ETL pipelines to move this data to our data warehouse in Redshift where the data could be analyzed. Today, we have a dashboard that captures our success metrics from this data.

Launch 🚀 and Learnings

Sep 2023

The journey from workflow builder chatbot to AI-powered

Four months after the inception of the idea, we launched our newly designed travel assistant fully powered by a large language model. We found a lot of conversations originating organically from WhatsApp where we send booking vouchers proactively post booking completion. Customers realized that this was an open channel for communication and started asking follow-up questions about their booking.

An interesting example came up when a user was looking for a ferry product from Hong Kong to Macau. They booked the one-way ferry from Hong Kong to Macau and then asked the assistant for steps to booking the return trip and were guided to the correct product. This was a massive win for us and ticked another one of our Vision boxes.

Although we were more than happy with the performance of the assistant there is always more to be desired. It’s not always possible for the LLM to provide the exact URL of the product especially when the conversation is not about a specific product. The team is still working on this currently as it will lead to higher conversions.

We noticed that users ask a lot of questions about details which are already present in the booking voucher. As a next step, instead of directing them to look at the voucher, we will provide the assistant with voucher information as well.

While we were very pleased with the launch and relieved that our customers were not agitated with the new experience, we still felt that we could have done some things better as a team on this project:

We did not have a clear timeline at the start of the project. This was hard to identify as this was not a deterministic problem. The POC helped scope things, but there was huge ambiguity about the technology (especially prompt engineering LLMs). In hindsight, we could have challenged ourselves more with a strict deadline.
We spent quite a bit of time contemplating an in-house build vs a third-party platform. Had we sat and discussed together followed by a quick evaluation, we would have saved at least a couple of weeks.
Despite trying hard to avoid it, there were times when the team members worked in isolation. This resulted in multiple iterations both in the development as well as testing.

Conclusion

We are completely sold on LLMs and are doubling down our efforts in this direction for the assistant. The feedback and results have been majorly positive. Despite some issues that we have already highlighted, we are marching along to add more capabilities to our assistant.

Feel free to reach out to us if you are keen to learn more. Better still, if you are interested and would like to work with us on this journey, please see our careers page for open roles.