Introducing Voice Search Experience at Booking.com
Communication is a natural part of our everyday lives.
People interact using voice and text, forming sentences to express what they desire. And yet, most of the search and discovery patterns out there rely on menu items and filter facets.
Building on our mission at Booking.com: “Making it easier for everyone to experience the world”, the ML & AI Product teams based in Tel Aviv decided to challenge the conventional search patterns by allowing the most natural way for everyone to communicate: using their voice.
This is the story of how we built a native in-app voice assistant at Booking.com, and as far as I know, the first voice search available today by a global online travel company.
I’ll share the process of making it possible by shipping a zero-to-one MVP, then incorporating learnings to make it easier to use, and our journey to make it a magical voice conversational experience.
1) Making it Possible
Industry studies of online marketplaces indicate voice assistants have become a mode of communication between customers and companies. The most appealing aspect of this feature is the transfer of touch and typing interfaces into spoken commands, conveying requests in free language and making actions easier to perform.
Introducing a free-form speech input allows customers to generate unstructured queries, resulting in a complex input to the search and recommendation systems. The unstructured form of natural language also allows people to explore different options in the app or perform actions that otherwise would be hidden for the sake of simplicity of the UI.
The studies mentioned above guided us in creating hypotheses for a zero-to-one product idea. We validate those through qualitative studies and quantitative controlled experiments, as described next.
Design for what people say they need
Our first step in designing conversational interaction and prioritizing MVP features started by a product discovery for users’ needs.
We used qualitative and quantitative methods:
- Fake door testing: A fake door experiment is a relatively easy way to measure interest in a feature or product, before fully building it. In our case, we displayed a voice entry point button as part of an experiment to selected users on the Booking.com app homepage and collected users’ feedback and engagement metrics, such as click-through-rate.
- Users Survey and moderated user interviews: questionnaire to gauge users patterns of use for voice command to accomplish tasks.
The results of the user survey and fake door test users indicated they would rely on the voice interaction assistant for efficiency.
Travelers indicated they imagine using voice interaction for search but most likely post-booking activities (change reservation, cancel, get direction).
Ship fast — Architecture and bootstrap technology
Building an MVP means it has to be lean and well prioritized, it doesn’t mean it has to be scrappy.
The product can, and often should, be bootstrapped by relying on available and ready-made technology. We detailed the list of software and algorithmic components and mapped them to either existing capabilities or ones that had to be built.
Early on we took the decision to abstract the frontend mobile client interaction which would allow faster development cycles and modularity for future capabilities (additional user intents and interactions).
We won’t cover in this blog post the Speech-to-Text machine learning models and technology selection, you can read more about it in this ArXiv Paper: ”With One Voice: Composing a Travel Voice Assistant from Repurposed Models“.
Being as actionable as possible
Quite often people will engage in a friendly conversation to share with each other, inspire or just to alleviate boredom or loneliness. That’s not the case in eCommerce.
Interactions in eCommerce would be either search, discovery, or support oriented. Which is why the product is measured by successful actions and their effectiveness to make it easier for people to achieve their goals.
Using the user survey and data sets, we defined the taxonomy of intents and relevant user journeys (actions that we should support and mapping them to the appropriate app screens and inputs).
In order to act on the transcribed intents, we relied on classification machine learning models for understanding intents; some of those models started as heuristics and evolved into complex natural language processing models.
Design for voice — VUI
The last step was to create the design and initial interaction (originally designed as a floating action button) on the app home screen to allow users to perform any post-booking or search action using their voice.
Leveraging context also has its limits, one of the decisions to be made early on: do we keep dialog context or start fresh dialog for each interaction?
We decided to treat each conversation as a separate one and validated it through A/B test experiments.
2) Making it Easy
Making voice interaction possible when we did in early 2020 was timely, as it helped many customers make necessary changes to their reservations when the outbreak of the pandemic impacted travel plans.
Learnings from Data, adjusting to how people use the product at scale
As time passed we were surprised to see the data patterns — we received significantly more search patterns (destination and places to stay) than what we originally expected.
To recall, based on the survey results people imagined themselves using the voice command for on-the-go post-booking activities, however, the data indicated people were using voice as an easy way to perform search queries.
Using those findings, we later focused our development efforts on making it easier for people to search for their trip using voice.
Some of those voice search commands we got back in Oct 2020:
“Search for five star hotel in Dubai 26th February”
“Find me a flight and hotel to Tulsa Oklahoma”
“Can you find me a hotel in the Houston area that allows pets”
Onboarding users to voice interaction
Another interesting data point was that a significant chunk of the voice commands were empty or inaudible. We had to check this farther.
In a follow up user research we found that many people didn’t know what to say, or how the voice button could help them. They pressed the microphone button but said nothing.
“This looks awesome, what else can I use it for?”
That led us to build ‘suggestions’ — guided examples on how to use the voice assistant.
3) Making it magical
In a follow-up user research, people were amazed by how easy it was to use the voice search. But we still had, and have, a long way to go.
Using Machine Learning we are making it magical by better understanding intents, to create a smooth and personalized experience.
Using data science we found correlations between people’s search queries and their context. We used experimentation to validate the impact of adjusting the suggestions we give to users.
We use machine learning classification models to predict the user intent and sample query that would fit best from the various options the voice assistant is supporting. Matching between the user journey moments to the displayed suggestions.
Focus on improving performance & quality
Improving noise recognition, adjustable silence detection (answering the question: Has the person completed their sentence or just thinking?).
We continue to apply Machine Learning and advanced intelligence models (AI) for adaptive noise recognition, smart pause detection, and other ways to make sure we capture the search query in full.
The investment was done not only in artificial intelligence and research domain, but also in user experience patterns to include longer conversation formats and interactions.
As natural as speaking to an agent
Conversational patterns are highly complex, we might not be aware of the user context but they expect we would.
Sampling unsuccessful interactions we noticed a few patterns that taking care of those could make the experience magical and human like:
- People open the conversation by greeting the assistant: “Hello”, “Hi”, waiting for a reply. We made sure to respond to greetings with suggestions.
- People tend to assume the voice assistant understands their context, so phrases like “hotels near me” or “hotels nearby” didn’t yield results as our named entity recognition didn’t recognize “near me” as a place. We added an appropriate search intent to take care of this.
Conclusion and Acknowledgements
Shipping a new product from zero-to-one can be challenging, add to this Machine Learning and a new UX and interaction, and at a large scale; Sound impossible? With the right people around you, everything is possible.
Hope this article shares a view on how we make Machine Learning and Artificial Intelligence products possible and magical at Booking.com.
If you are interested to hear more, we are always hiring for various product, engineering, machine learning, and UX roles!
This article was written together with Chalom Asseraf, former Design Manager at Booking.com Tel Aviv.
We would like to thank Gil Amsalem, Amit Beka, Shachaf Poran, Giora Shcherbakov, Dor Samet, Teri Hason, Jessica Jaffe, Adam Horowitz, Sophie Galante and the rest of the contributors to the Voice Assistant product at Booking.com.