How we speak to others, convey our thoughts and views, influence our quality of life to a high degree. Across cultures and geographies, voice-based communication has been the most effective way of expressing oneself. Your voice is a part of you, and it’s the most intimate form of contact with the world. No other form of communication can convey emotion and personality as our voice. One could argue in favour of haptic inputs like the sense of touch, head movement, hand gestures, etc being equally effective. But the scope of voice communication is much more extensive compared to touch-based communication.
When it comes to interaction with external systems, as with everything, things have evolved. Primitive beings used everything from thin feather quills, pigeons, fire, and smoke. The advent of printing was perhaps the greatest achievement of humankind as a whole. We then graduated from type systems, print sets to modern electronic devices to the present generation of hardware-driven input/output devices like monitors, keyboard, trackpads, mouse, etc. With mobility, the focus has shifted to a personal form of input with touch and gestures.
Voice-based interaction with systems is still in its infancy and catching on. But there are many applications and examples which show that Voice UI is not a fad, but the technology for tomorrow. Take the case of Amazon’s Alexa, Google Assistant, or Microsoft Cortana. These companies offer services that take voice-based inputs. Tech titans, namely the Big 5 (Apple, Facebook, Google, Microsoft, and Amazon), have invested heavily in this sphere. That should give a business leader food for thought about where this technology is headed and the next best thing. The technology is in its infancy, but the potential benefits for users and businesses are enormous. It is projected that chatbots will save companies billions of dollars in the coming decade. According to a Gartner report, chatbots will be handling no less than 25% of all customer service interactions soon.
Let’s take a look at an example with voice-based features and without voice-based functions. Consider a service company that offers a whole range of services around a particular product. A Bank that offers monetary services. Consider a customer who logs into the website of the company and searches for a loan product. Then drills down through the list of services(instant, short term, personal, business loan), zooms in on one (short term personal loan), checks the features, etc. The customer may then go back and take a call about the product by evaluating its pros and cons. Imagine the same scenario with the bank, where it has VUI. The customer will log in and ask the chat assistant, “Show me the best short term loans you have.” The time taken to ask the question is less than typing the same sentence.
Consider another example. The all too popular ride-sharing service; Uber. To book a cab, a rider has to share his current location, destination, pick up point, specify solo trip or share trip, check the options, and confirm. With a voice-based input, the user could simply say, “Book me a cab from this street to that street with an Uber regular on a sharing basis.”
The benefits of such a system are clear. VUI is more than a mere “cool to have” feature. It’s direct, easy to use, fast, and relevant. VUI also has the added advantage of being the best form of interaction for impaired people who want to use complex systems. But is it all roses and simple to build a VUI? And why hasn’t the technology boomed as of today?
Voice and context and the challenge with it
Voice-based communication is context-driven. There is an idiosyncrasy to human interaction, which varies from gender, culture, age, etc.
Try comprehending the context in the following statements
“The numbers are low”
”Times are changing”
“What we know is a drop, what we don’t is an ocean”
Now consider a way of rephrasing the first statement.
“Associate, I am going to make a statement about our Sales numbers.” “The sales numbers are low !!!”.
Unless you have a robotic disposition, it’s hard to imagine someone corresponding like that. This context-driven conversation is an expression of natural language. Add to this the general cultural references, slang terms, and it becomes tough for even a person to comprehend what the conversation is all about. Let alone a machine.
“Muhammad Ali is the G.O.A.T.”
A non-boxing fan can be forgiven for thinking that the person is making fun of the legendary boxer. Still, an avid sports fan or boxing fan would know that the person is acknowledging Muhammad Ali as the Greatest of All Time (G.O.A.T)!
Brands, Businesses and Voice
Businesses are always on the lookout for the next best thing to enhance their brand. To make it irresistible. Customer service is never “a part” of any leader’s plan. Instead, it’s always the pivot around which all services and products are offered. Part of a brand which endears itself to its customer is the experience the brand provides to its customers. VUI and VUI bots can be a game-changer in this.
Chatbots are backed up by nascent AI (as of today), which can or tries to engage in a conversation with users/customers in a natural language through a digital medium such as websites. Since an AI-based backend drives it, the more the bot learns, the better it becomes. Chatbots extends the interaction between humans to an interaction involving humans and machines. This is the key. To give the customer a different, personalised experience.
Chatbots or voice user interfaces give replies to questions asked by a customer in natural language. For example, imagine an insurance company with a VUI bot to help customers search products, check their claim status, raise claims, enquire about products, etc. Instead of traversing through a large number of sections, subsections of product catalog, by simplifying the customer experience, conversion rates are improved, and purchasing is made more accessible.
How do they work
The most important thing while building a chatbot is to understand the intent of the customer. This can be made simpler by providing a sample space of services available. This enables the bot to quickly understand the input. The clearer the users intent, the better the bots response. With clear intents a bots response can be easily sourced from multiple places. It can be from
- A database of predefined responses For e.g., In response to a query, What time do you close? Or What are your working hours. These are fixed responses
- A reply based on the AIs learnings
- A reply framed as a question to clarify the user’s original query
- Response retrieved by interacting with a third part service For e.g., What’s the weather outside, etc
Designing for VUI
The most common framework which a designer uses while building an interaction model is design thinking.
Design Thinking involves:
- Prototype & Test
But do these principles hold good when a designer aims to build a voice user interface? Besides researching the user, developing use cases, and empathising with their requirements, VUI has specific requirements that a designer should keep in mind.
Interaction Model is the first step for designers to build an appealing VUI.
About your product or service, How will a user use it?
What would the parameters, both fixed and varying, be in their interaction with your system? An interaction model built in such a manner helps define the boundaries of the system. Identify the services provided and not provided with your system.
Once the system boundaries are defined, identify how the user will interact with it. For example, a user might ask a music streaming service, “Play me some blues music.” An intent expressed like this is easy to classify and respond. Explicit intents are specific intents. But the same intent can be expressed as “Play me some B.B. King Classics.” The designer has to classify the intent expressed in the best possible manner. The interaction model helps in capturing this difference in similar intent. Voice interaction with visual cues helps in offsetting any negative experience associated with an incomprehensible intent.
For e.x., a blinking Red light
Due to the latent difficulty in natural language processing, VUI should factor in feedback with pertinent information, when the intent is not clear. Users will find it hard to verify whether their input has been understood without a visual cue. Overloading the user with unrelated information should be avoided.
A couple of decades back, when the www was taking birth, it was looked upon as the future. Back then, it was known as Web 1.0. Cut forward to the present, and the internet has become an integral part of our life. Similarly, VUI finds itself on the threshold of becoming mainstream and widely adopted.VUI has the potential to alter the dynamics between communicating entities like never before. Be it, humans and machines or humans and systems. VUI is definitely here to stay