8 Weeks to Innovation: Lessons from Building Booking.com’s AI Trip Planner
Introduction and Background
Conversational AI is transforming the tech industry, offering new ways to interact with products. At Booking.com, I had the opportunity to be part of our first steps with Generative AI. In April 2023, working with a team of highly skilled and talented individuals, we developed and launched our first AI chat assistant within 8 weeks, designed to help customers plan their trips.
Over a year has passed since then, and with the rapid advancements in technology and shifts in the industry, now seems like a good time to reflect on the lessons learned. Working with an AI chat assistant, I’ve developed a deep appreciation for this exciting technology and believe it has the potential to completely transform how we interact with products.
In this article, I reflect on our journey and share some of the lessons we learned.
What is the AI Trip Planner?
To start, let me introduce the AI Trip Planner. Booking.com is an online travel agency that offers a wide range of travel products, from accommodation and transportation to attractions. The AI Trip Planner is a chat interface that allows customers to start their trip exploration journey from the broadest point — seeking inspiration for places to visit — to specific queries like finding accommodation that suits their unique needs. Within the product, customers can share their travel wishes and receive recommendations for places to visit, things to do, and suggested itineraries. Recently, we’ve also started sharing relevant flight offers, and we plan to integrate our taxi, public transport, and car rental options in the future.
How to Train Your Dragon (LLM)
In the early days of working with Generative AI and large language models (LLMs), it felt a bit like training a mighty dragon. The technology was unpredictable — complex and mysterious, almost like a mythical creature. Working on the AI Trip Planner reminded me of my son’s favorite TV show, How to Train Your Dragon. In the show, the main character, Hiccup, learns to train dragons by understanding their unique behaviors, strengths, and weaknesses, ultimately building a trusting, symbiotic relationship.
Similarly, training an LLM involves fine-tuning the model to understand language patterns, contextual nuances, and user needs, creating a “bond” between the model and its users. Both processes require patience, ongoing adjustments, and a deep understanding of underlying complexities to achieve a harmonious interaction — whether between human and dragon or user and AI.
Much like Hiccup, we initially proceeded cautiously, putting in place many rules. This led to our first version of the AI Trip Planner feeling more like a traditional chat assistant that is built on top of a decision tree, rather than a natural flowing conversation. But we soon realized two important lessons:
- UX skills are essential: While our machine learning scientists (MLSs) were highly skilled in working with AI, they didn’t hold the relevant knowledge in creating intentional user experiences. At the same time, as good as our UX writers were, they did not have the necessary knowledge required to work with an LLM. The solution was working together. In the early days of building the AI Trip Planner, this was done within a meeting where UX, Product and MLS set together, changing things and testing on the local machine of the MLS. Later on we’ve built tools to help streamline this process and make it scalable.
- It’s a fine line between predictability and determinism: To minimize hallucinations, we initially added more restrictions and examples to guide the conversation. While this made interactions more predictable and accurate, it also made them repetitive and, frankly, a bit boring. You have to have some faith in the LLM. A good practice is to run the same queries on generic LLMs such as ChatGPT or Gemini and see what experience you receive. If the result in your product is worse, iterate until you get it right.
Thankfully, we ran internal tests with almost 200 testers before launching the AI Trip Planner. This allowed us to refine the model and ensure the version that reached our customers offered a more natural, engaging conversation.
Launching within 8 weeks
When we started developing the AI Trip Planner, very few companies had a similar live product. The technology was still in its infancy, and although OpenAI has since improved its B2B support, at that time they were overwhelmed with inquiries and were essentially unresponsive. This meant that we had to navigate this journey on our own.
But not only did we make it — we did it in 8 weeks, which was unprecedented for a product of this scale at Booking.com. I remember the immense pride we felt and the euphoria that enveloped everyone involved. However, there was no time to rest post-launch; it marked just the beginning of an incredible journey.
How did we manage it? Several key factors contributed to our success, from the people to the technology involved.
Lessons still learned
While I feel much more confident in this space now, I wouldn’t say I’ve “cracked the code” for the perfect Conversational AI product. But I’d like to share what I’ve learned along the way, both from our successes and the challenges we faced.
1: Investing in AI early on
Booking.com has been investing in AI for over 20 years, with machine learning (ML) models already powering many of our live features. This foundation gave us a head start, letting us build on a solid infrastructure and leverage a workforce well-versed in AI. Being able to combine Generative AI with our own ML capabilities played a big role in helping us launch a robust product in such a short timeframe.
2: The right people with a shared mission
When it came to assembling a team for this project, our leadership made the bold decision not to assign an existing team, but instead to pull in top talent from different departments to form a temporary “Tiger team.” That meant that we had 8 weeks to learn to work together, as well as launch a new product in an area that was new to all of us. This new team was given full autonomy, priority access to resources, and a clear mandate: deliver the product in 8 weeks and do it well.
For me personally it felt at the time like orchestrating a symphony, with many different talented players, all needing to come together in harmony. However, reaching that harmony took deliberate effort. Every member brought their unique expertise and ideas, and it was up to us to make sure all of it came together smoothly.
The Tiger Team
Our Tiger team was divided into two workstreams: one focused on creating the front-end experience, the other on building the complex “engine” that powered it.
- The “Experience” Team: This team, which included UX writers, designers and researchers, a product marketer, front- and back-end developers, and myself as Product Manager, focused on defining the user experience and the front-end logic for the product.
- The “Engine” Team: Comprising MLSs, back-end developers, data scientists, and other experts with years of experience in AI, this team built the model and data science components that made the AI Trip Planner work.
We also had a Legal and Compliance team to ensure the product aligned with Booking.com’s values around responsible AI. Leadership gave us a surprising level of freedom to innovate, though they were definitely sitting at the edge of their seat, hoping this ambitious bet would pay off.
The key factors that helped pulling this all together were:
- Defining a clear, unified objective — We aligned early on a single goal: to build a product that would both be helpful to customers and teach us something along the way. Anything that didn’t help, or that wasn’t necessary to achieve that was out of scope for the minimum viable product (MVP) version of the AI Trip Planner.
- Building trust among the product leads — working together with my talented colleagues — Hadas Harush that managed the ‘Engine’ side, and Diana Almeida that made sure we applied ‘Responsible AI’ every step of the way, we’ve established mutual trust early on and worked to keep it strong throughout, as well as beyond the launch.
- Frequent check-ins — Beyond daily standups, we used a shared Slack channel for ongoing updates and met with leadership twice weekly. This ensured everyone is always up-to-speed and gave us a steady flow of feedback, allowing us to adjust quickly.
- Small, testable milestones — With an 8-week timeline, every week needed to count. We set small but meaningful goals each week and tested rigorously, keeping us on track and ensuring our leadership team was in the loop.
- Eliminating noise for the “builders” — We kept distractions at a minimum for the engineers and designers. Feedback filtered through Product Managers first, allowing us to prioritize what really needed attention and keeping the development environment as focused as possible.
3: Happy Pathways — a way to plan intentional user flows
One of the challenges we encountered was that, as tech people, we’re often wired to think in rigid “if-this-then-that” structures when it comes to training machines, which aren’t always ideal for conversational AI design. To avoid falling into this trap, I developed a framework I called “Happy Pathways.” Inspired by how Neural pathways work, the core idea behind this concept is that our customers will be able to travel through specific pathways, given certain conditions are met, but will also be able to transition between pathways when relevant. In practice what it means is that I share with my UX and ML teams a visualization that maps out ideal user flows, guiding our UX and MLS teams to craft smoother, more intuitive experiences. These pathways highlight objectives at each interaction, allowing us to iterate on specific segments and refine the user journey step by step. It doesn’t mean we always get it right the first time, but it sets us on a defined path we can optimize individually over time, while maintaining the natural flowing conversation feeling.
4: The target keeps moving, and you need to move with it
In a fast-evolving field like Conversational AI, keeping up with constant change is a given. Since launching, we’ve updated our LLM model several times, adapting not only to the new capabilities. Rapid change also comes from the user side, with new behavior patterns emerging as users get used to this technology. Initially, most customers used the AI Trip Planner like a search box, entering short, specific queries. But as we rolled out educational user experiences, users began providing broader, more conversational prompts, which matched our initial vision for the product. For a product like this, it’s crucial to revisit assumptions frequently and ensure that both technology and user expectations align.
5: Returning to basics
While our team often leans toward solving novel challenges, some of the best solutions still come from traditional UI principles. One example is the suggestion bubbles we added — these are simple UI elements that display prompt ideas to help guide users in interacting with the AI. This small addition worked similarly to placeholders in traditional search boxes and in the AI Trip Planner it led to a significant increase in engagement. It was a reminder that even in cutting-edge projects, tried-and-true UX practices can still work wonders.
What is the Way Forward?
I started by comparing working with this technology to training a dragon. If we continue with this analogy, we might wonder: are dragons here to take over? Or, in other words, is Conversational AI here to replace traditional UI?
Especially with the multimodal capabilities of Conversational AI — where entire user experiences might happen via voice or video interactions — one might wonder where conversational ends and traditional UI begins. Since applied experience with this technology is still in its early stages, it’s hard to answer this question. But I wanted to explore it through analogical reasoning, a critical thinking method that uses past examples to analyze current topics. For this, the evolution of touchscreens came to mind.
Touchscreens vs. Conversational AI
Both touchscreens and AI technologies were introduced in practical applications long before they entered industry leading, mainstream applications:
- Touchscreens: The first practical application of touchscreens dates back to the early 1970s, primarily for industrial and scientific purposes. They were later introduced to point-of-sales devices commonly used in kiosks in the early 1980s. However, touchscreens became a groundbreaking technology only when they were introduced on mobile devices.
- Generative AI: Similarly, the first practical application of AI goes back to the 1960s, with numerous applications. Natural Language Processing (NLP) was used around the same time for tasks such as translations and simulating natural conversations. However, the groundbreaking usage of NLP and AI was through Generative AI, with the introduction of ChatGPT, followed by other powerful models such as Gemini and LLaMA.
Touchscreens changed industries and created new ones, much like Conversational AI today. Yet as we saw with devices like the iPad and the Blackberry, technology alone doesn’t dictate success — user preferences and adoption patterns play a pivotal role.
Holding on to old ways
To expand a bit more on this, let us consider the BlackBerry. The first model was launched in 1999 and was known for the QWERTY keyboards. While others in the mobile industry gradually adopted touchscreen and eliminated keyboards, BlackBerry devices, even when implementing touchscreens, continued to be released with this keyboard until the very last model. In 2014, John Chen, back then the Executive Chairman and CEO of BlackBerry said: “We listened closely to our customers’ feedback to ensure we are delivering the technologies to power them through their day…It’s the secure device that feels familiar in their hands, with the added performance and agility they need to be competitive in today’s busy world.”
From many to a single, to none:
In 2007, the first iPhone was introduced, marking a revolutionary shift in mobile phone technology by incorporating a touchscreen and eliminating all buttons except one. This launch sparked intense debates within both the tech industry and mainstream media. I still remember the passionate public debate that was centered around whether Apple was taking too big a risk by removing essential features that users were accustomed to and whether the lack of physical buttons would hinder user experience and if the touchscreen would be a viable replacement for traditional input methods.
Samsung, Apple’s main competitor back in 2007, adopted the touchscreen technology but chose not to fully eliminate physical buttons. Instead, they continued to produce various models that combined both touchscreen and buttons, although the number of buttons was reduced from its early models.
Fast forwarding to 2024, BlackBerry has been discontinued 2 years ago and iPhone and Samsung devices have become nearly indistinguishable from each other.
So, can we safely say based on this that Conversational AI is here to replace traditional UI completely?
Let’s consider another popular touchscreen device from the same period as mobile phones — the Tablet. When Apple launched the iPad in 2010, it sparked enthusiasm and speculation that tablets would eventually replace laptops.
Despite improvements in processing power and cloud services aiding memory conservation, user preferences ultimately determined the outcome. As a tech person, you likely still use a laptop, keyboard, and mouse/pad rather than a tablet operated solely via a touchscreen.
Considering both examples, it seems we are not yet at a point where we can definitively answer the question of Conversational AI vs. traditional UI. It could be that a brave single player will shift the entire industry, or it could be that users will dictate the way, or both.
Personally, and at least for now, I view Conversational AI as a complement to traditional UI, while I continue keeping an eye on emerging technologies and user behavior trends that might shift this paradigm and force me to reconsider this point of view.
Closing words
Whether you are just starting to create your own conversational experiences or are deeply involved in optimizing your existing AI chat product, I hope this article has helped alleviate some of the uncertainties and challenges you may face, if not validate your current ways of working. I firmly believe that sharing our knowledge accelerates our collective improvement and advancement. I look forward to learning from others who have encountered and perhaps even solved similar challenges, and I invite you to share your comments and thoughts.