VOICE ASSISTANTS

Helna Saju
IETE SF MEC
Published in
5 min readJan 7, 2021

The History of Voice Assistants

A voice assistant is a digital assistant that uses voice recognition, speech synthesis, and natural language processing (NLP) to provide a service through a particular application. The key here is voice.

Voice recognition technology was around long before Apple’s Siri debuted in 2011. At the Seattle World Fair in 1962, IBM presented a tool called Shoebox. It was the size of a shoebox and could perform mathematical functions and recognize 16 spoken words as well as digits 0–9.

In the 1970s, scientists at Carnegie Mellon University in Pittsburgh, Pennsylvania — with the substantial support of the United States Department of Defense and its Defense Advanced Research Projects Agency (DARPA) — created Harpy. It could recognize 1,011 words, which is about the vocabulary of a three-year-old.

Once organizations came up with inventions that could recognize word sequences, companies began to build applications for the technology. The Julie doll from the Worlds of Wonder toy company came out in 1987 and could recognize a child’s voice and respond to it.

Throughout the 1990s, companies like IBM, Apple, and others created items that used voice recognition. Apple began building speech recognition features into its Macintosh computers with PlainTalk in 1993. In April 1997, Dragon came out with Dragon NaturallySpeaking, which was the first continuous dictation product. It could understand about 100 words per minute and turn it into text. Medical dictation devices were one of the earliest adopters of voice recognition technology.

Popular Voice Assistants

Siri by Apple became the first digital virtual assistant to be standard on a smartphone when the iPhone 4s came out on October 4, 2011. Siri moved into the smart speaker world when the HomePod debuted in February 2018.

Google Now (which became Google Assistant) on the Android platform followed. It also works on Apple’s iOS, but has limited functionality.

Then the smart speakers came along, and “Alexa” and “Hey Google” became a part of many household conversations. Alexa by Amazon is part of the Echo and the Dot. Google Assistant is part of the Google Home.

Samsung has Bixby. IBM has Watson. Microsoft has Cortana on its Windows 10, Xbox One machines, and Windows phones, and Nuance has Nina. Facebook used to have M, but its usage in the Facebook Messenger app ended in January 2018.

By default, most of the voice assistants have somewhat female-sounding voices, although the user can change them to other voices. Many people refer to Siri, Alexa, and Cortana as “she” and not “it.”

Technology Behind The Voice

Deep Neural Network (DNN)

When an input voice is given to voice assistants, the basic thing which is done is that the voice is converted to text, analysed to come up with a reply in text and is then converted back to voice. As soon as the voice is transcribed to text using Natural Language Processing (NLP), it is analysed based on the dataset that it has.

Hybrid Emotion Inference Model (HEIM)

Humans can very well get an idea of emotions through the tone of voices of others. An ML model called Hybrid Emotion Interference Model (HEIM) involves Latent Dirichlet Allocation (LDA) to extract text features and a Long Short-Term Memory (LSTM) to model the acoustic features, is deployed, to reveal the kind of emotions behind our voice.

NLP, NLG

NLP, the ability of machines to understand and learn from the languages that humans speak and write, has obviously been deployed in these assistants. But apart from NLP, a lesser known AI technology called the Natural Language Generation (NLG), which generates the text and speech using predefined data, is also put to use. At its most advanced, it powers the responses given by AI assistants, such as Google Home and Amazon’s Alexa, when asked a question.

The Future of Voice Assistants

The number of people using voice assistants is expected to grow. According to the Voicebot Smart Speaker Consumer Adoption Report 2018, almost 10 % of people who do not own a smart speaker plan to purchase one. If this holds true, the user base of smart speaker users will grow 50 %, meaning a quarter of adults in the United States will own a smart speaker!

Voice assistants are always improving and “learning.” AI companies use data from existing systems to improve what assistants can do. Voice assistants are not going anywhere. People thought of it as a fad, but it’s not. It’s changing what people do in their homes. Voice assistants will grow and are here to stay. Voice assistants will be in everything, and the smart speaker might fade away in a few years because many technologies, like televisions and refrigerators, will have their own voice assistants.

Follow us for more amazing and the latest science and technology articles. We are also there on social media. Follow us on Instagram|LinkedIn to stay updated!

**We also invite science and technology enthusiasts to write for us. If you think you have interesting stuff which the world should know about, send in your articles to us!***

Interested in writing for us? Fill up this form!

HAVE A GREAT DAY!

--

--