Are we and computers ready for routine interaction via speech?
Maybe now as AI achieves superb speech recognition, speech synthesis, text processing, and multimodal capabilities?
Voice User Interfaces (VUIs) have long been a topic of both fascination and skepticism among UX designers and developers. But as AI continues to evolve at a rapid pace improving computers’ capabilities for understanding and synthesizing speech, and even for “thinking”, the question of whether we and computers are ready for routine interaction via speech has become more pertinent than ever. I discuss this with my colleagues very often, and I thought I could share with you a distilled, compact version of these exchanges, focusing on the main reasons why I think AI is really close to revolutionizing human-computer interaction and what problems still lie ahead.
While the idea of speaking to our devices as naturally as we converse with other humans has captured our imagination for a long time, surely thanks to science fiction (hence, Designers: you need to read science fiction), it has also faced significant challenges in real-world applications: bad performance at understanding users’ requests especially when they have strong accents or use complicated jargon, creepy synthetic voices, and limited understanding and processing of…