Voice Assistant Timeline

A Short History of the Voice Revolution

Published in

Geek Culture

4 min readFeb 24, 2022

Way before the voice revolution made human-computer interaction and conversational AI part of our daily life, machines had to learn how to hear, identify and process human speech.

The voice assistant technology (such as Siri or Alexa) used today has been under development for over a century. It has come a long way from the first listening and recording devices to cutting-edge technologies found now in every family. Here is a brief history of where speech recognition comes from.

EARLY DAYS

Computer communication was initially conceived as communication made with the help of computers and not through them.

Until the 50s, computers were designed with the idea of direct communication with people: they were thought to become true social interpreters capable of interacting with the voice. But technological progress was in its infancy, and machines could not process natural language at such a level as to talk to humans. After all, computers soon became communication facilitators between people rather than their conversational partners.

1950S: AUDREY

Yet, the desire to create talking devices did not disappear here, and in 1952, a group of scholars from Bell Laboratories built the first voice assistant for hands-free dialling or Audrey.

Audrey could recognise digits from 0 to 9 but had inevitable drawbacks in terms of its capacity and size (which reached 180 cm in height!).

What is more, the maintenance and production costs made it completely unsuitable for sale on a large scale. Thus, pressing real phone buttons turned out to be quicker and more reliable than using Audrey for it.

1960S: SHOEBOX

1961 — a debut video from the IBM archives with a demonstration of IBM’s Shoebox

At the 1962 Seattle World’s Fair, an IBM engineer presented Shoebox — a voice-activated calculator capable of identifying ten digits and some control words (e.g., plus, minus, total, subtotal, false, and off) which had to be spoken through a microphone.

Shoebox transformed the recognised sounds into electrical impulses and gave instructions to another machine that calculated and printed the results of simple mathematical problems provided by voice command.

Although both Audrey and Shoebox were equipped with speech-to-text technology, they crucially lacked a backwards text-to-speech method. In other words, after receiving a voice command as the input, they could convert it into written text and process it but could not transform the written information back into speech and respond to the user’s questions.

1970S: HARPY

1976 — documentary on the CMU’s Harpy Speech Recognition System

The final step towards a proper conversational agent was achieved in 1976 with the help of Harpy and a team of researchers from Carnegie Mellon University.

With a vocabulary of 1011 words, Harpy understood entire sentences and recognised the boundaries of individual words.

Its main accomplishment was the ability to understand spoken commands containing pre-programmed vocabulary, pronunciation, and grammar structures — just like today’s voice assistants!

1980S: TANGORA

Ten years after the launch of Harpy, in 1986, an updated version of Shoebox was introduced. Named after the fastest typist in the world, Albert Tangora, the device had an improved memory capacity, allowing it to remember up to 20,000 words. Tangora was one of the first systems using the probabilistic model, thus it processed speech by predicting the most probable outcome based on what it had analysed previously.

1990S: NATURALLYSPEAKING

Until now, the conversational systems presented above were solely utilised in research laboratories and did not have a real-time user experience. The situation had changed in 1997 when the first speech recognition software “Dragon NaturallySpeaking” was released on the market. The technology, capable of recognising and transcribing natural speech (without pauses between words), cost $695 and was relatively more affordable than its predecessors.

2010S: WATSON AND OUR DAYS

In the early 2000s, the story of the voice revolution reached a decisive turning point: the question answering system, Watson competed with the best champions of the popular television quiz Jeopardy! and defeated them in total points. Thus, becoming the first system capable of processing natural language with the same speed and confidence as a human.

Watson competed against the world’s best Jeopardy! champions

This victory set the stage for a forthcoming set of digital smart products that you can control with your voice. So, only two months after Watson’s success, Apple introduced Siri to the world, then conversational assistants began to pop up like mushrooms after the rain (2012: Google Assistant, 2013: Cortana, 2014: Amazon Alexa, 2016:Google Home, 2017: Bixby etc.).

Conclusion

The sudden surge in the development of voice assistants is inextricably linked with the recent achievements in the field of AI. If just a few decades ago, most speech technologies were based on simple rule-based architecture, today the situation has changed dramatically, and we have managed to succeed significantly due to the introduction of cutting-edge statistical methods.

Who Stays Behind Your New Voice Assistant, or What I Learned as a Language Annotator for Big Tech

Have you ever caught yourself wondering how your bright new Alexa understand you right off the bat?

medium.com