Voice : Anatomy of The Invisible Interface

Krishnan Srinath
Apr 8, 2019 · 3 min read
Image for post
Image for post

My 7-year-old kid’s first interaction with computers was through voice. He asked Apple Siri to answer trivia like “Which is fastest car?”. When we bought Amazon Echo, my father commanded Alexa to sing old songs. Voice based Interactions are becoming ubiquitous in our daily lives. We can find them in smartphones like Apple Siri, voice assistants like Amazon Alexa, Google Home, and range of other products. It is clear voice-based interaction will soon replace graphical user interface. According to industry experts 30 % of interaction with devices will happen over voice in the next 2–3 years.

Voice is a Natural Means of Interaction

Users have long interacted with computers by typing commands over keyboards or using Graphical User Interface (GUI). GUI or typing commands over keyboards requires users to learn interface and recall during every interaction. This often results in friction between user and computers. Voice reduces friction, it is like magic. Say few words to device it grants wish. Voice is a natural means of interaction.

We’re finally ready for Voice

Voice recognition is not new, it has been around for a while. IBM shoebox was most advanced speech recognition machine in 1960s. This software required lengthy training to learn specific version, limited by computing power. We are now in the age where computing is cheap, rapid penetration of computers, phones across the world allows machine learning algorithms to be trained using millions of samples from internet. It gives systems ability to recognize almost anyone’s speech.

Image for post
Image for post

Voice makes experience personal

Voice assistants like Siri, Alexa saves time on routine tasks like checking weather, ordering food, playing music, replying to messages. They make experience more personal.

Designing Voice Based Interaction

Designing Voice based interaction consists of three key factors: the intent, utterance, and slot.

Let’s analyse the following request: “Play relaxing music on Alexa.”

Intent (the Objective of the Voice Interaction)
The intent represents the broader objective of a users’ voice command. In this example, the intent is evident: The user wants to hear music.

Utterance (How the User Phrases a Command)
An utterance reflects how the user phrases their request. In the given example, we know that the user wants to play music on Alexa by saying “Play me…,” but this isn’t the only way that a user could make this request. For example, the user could also say, “I want to hear music ….”

You need to consider every variation of utterance. This will help the engine to recognize the request and link it to the right action or response.

Slots (the Required or Optional Variables)
Sometimes an intent alone is not enough, and more information is required from the user in order to fulfil the request. Alexa calls this a “slot,” and slots are like traditional form fields in the sense that they can be optional or required, depending on what’s needed to complete the request. In our case, the slot is “relaxing,” but since the request can still be completed without it, this slot is optional.

Challenges

Although computers can recognize speech more reliably and more natural sounding, they don’t understand context. For e.g. Apple Siri does not understand context, it simply says I can’t understand. It is good to respond to simple commands but fail miserably to understand context, have conversation. It needs to overcome if this hurdle if it needs to flourish and adeptly widely by consumers.

Conclusion

When Douglas C Engelbart demonstrated how to use keyboard and mouse, it changed the way we interacted with computers. Voice has similar potential to have big shift in the way we interact with computers. The need for voice is real, early experience is having a positive impact on the way user interacts with the computers. Hopefully this leads to accessible world.

HackerNoon.com

#BlackLivesMatter

Sign up for Get Better Tech Emails via HackerNoon.com

By HackerNoon.com

how hackers start their afternoons. the real shit is on hackernoon.com. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Thanks to Natasha from Hacker Noon

Krishnan Srinath

Written by

Data Generalist. Using data to make machines humane.

HackerNoon.com

Elijah McClain, George Floyd, Eric Garner, Breonna Taylor, Ahmaud Arbery, Michael Brown, Oscar Grant, Atatiana Jefferson, Tamir Rice, Bettie Jones, Botham Jean

Krishnan Srinath

Written by

Data Generalist. Using data to make machines humane.

HackerNoon.com

Elijah McClain, George Floyd, Eric Garner, Breonna Taylor, Ahmaud Arbery, Michael Brown, Oscar Grant, Atatiana Jefferson, Tamir Rice, Bettie Jones, Botham Jean

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store