Speech Recognition and Voice Command have arrived after 25 years of torment (part 2)
In the previous post, I talked about the absurdity of old-style user interface, by painting a picture of two humans trying to “interface” with one another using only touch/type input or “bad” voice command.
Old tech UI, if used human-to-human, would look like a cross between a bad game of charades and pre-lingual cave people relaying information…. Catch-Phrase game night at the Homo Habilis family dwelling.
Now (today), from my perspective as a multi-modal VUI developer (that’s a long way of saying voice/touch/audio/video apps), I’m surprised that mainstream users aren’t more amazed by the current capabilities of simple voice command and complex speech recognition.
Why is that? What has blinded us to the current awesomeness? In Part 1, I answered, “25 years of false promises.” Let’s unpack that idea.
Two factors are, (1) incremental change, and (2) desensitization from hype (this hype contains the false promises that numb us).
- Incremental change, while accelerating, still goes unnoticed
The first factor is incremental change.
Massive disruptive change: automobiles, telephones, penicillin, moon-shots, atomic bombs… these things get noticed.
But as the rate of innovation has increased, we’ve become increasingly numb to every science fiction concept making its way into our daily life. To say we “take technology for granted” is to massively understate the point.
Instead of seeing video conferencing for the “miracle of modern technology” that my 99 year old grandmother considers it to be, we curse at our handheld, super-computing, quad high definition, multi-touch, GPS navigating, HD photo/video capturing, multi-media playing, global-communicating PHONES when they take too long to drop one world-wide network connection and find another, faster one.
Ray Kurzweil’s brilliant work first introduced me to the concept of accelerating innovation in the early 2000’s; it has been examined extensively from the widely accepted Moore’s Law to its more broad application to all technology and innovation. Rather than talking more about it, I’m going to leave you with the above link to Ray’s early essay and this chart from Pew:
To summarize, technology change is accelerating and is virtually continuous, and to put it lightly, we totally take it for granted.
2. Desensitization from Hype
Two sub-points about desensitization from hype are, (a) Smart people and companies know that increasing human-to-tech bandwidth, starting with smart voice user interface, will change everything about how we relate to the devices they sell. Thus businesses are eager, even desperate, to be positioned as leaders when this occurs, and (b) voice command (a wake word and response to a limited vocabulary) has been around for a while, and during that while, much hype has been peddled.
Yes this was a real thing. I used it in 1994 in my tiny dorm room at Duke University.
Twenty five years ago! Two and one-half decades.
So what the heck has being going on in the meantime?
Slow, incremental progress, and promises with each increment that everything finally works easily and intuitively.
Over-hyping voice command is a primary culprit in our desensitization to the current amazing state of speech recognition and voice user interface capabilities.
Voice command is very easy to present in a demo that is orchestrated in such a way that it APPEARS to work well (after much behind-the-scenes practice), while in reality it may be incredibly fragile and easily “confused” by variations in any of multiple nuanced and subtle attributes of speech (rate, accent, dialect, etc).
These AWESOME demos in ads have been featured in ads for products ranging from:
Holiday-season in-home dog-biscuit-serving robots (1984) https://youtu.be/d9aykcDp8y8
to cars (2007) https://youtu.be/OvHJgF_iLZM?t=12
and TVs (2012) https://youtu.be/WtqPHp5Oy_Q?t=111
Up until very recently, these devices have been neither easy nor intuitive to use.
So you can’t blame us for not putting batting an eye when shown advertisements of Galaxy/Bixby, Cortana, Google, Siri, and Alexa.
“Yeah I’ve seen this trick and I’m not buying it,” would be the prudent response to the latest promise of easy and intuitive voice user interface.
However, this time IS different, and in the next part of this series, I’ll tell you what has changed.
TO BE CONTINUED IN PART 3
This article is an update of, “The Future is Here… so why hasn’t anyone noticed? (part 2),” originally published on LinkedIn in November 2018,