Voice Technology Confuses Users: A View on How We Got Here
Apple’s Siri has been around since 2011 and other companies like Amazon and Google have invested billions of dollars into voice technology. Why is it still so confusing for users to use?
Every year Adobe interviews 1,000 voice technology users in the United States to get an understanding of how they are embracing voice technology. According to the 2020 survey, the number of users who claim they do not know where to begin accomplishing a task with voice technology went up by 14% to 63%. This was alarming to me, so in this post, I am going to explore how we got this far with voice technology adoption alongside such confusion.
Siri Launch & Customer Expectations
The Siri voice assistant was launched in 2011 as an embedded feature on the iPhone 4S, only a day before Steve Jobs passed away. If you watch the Apple Keynote you will realize that Apple Executives were really setting the bar high with consumers on what the Siri Voice Assistant was capable of doing. They promised a Multi-Modal Voice Assistant that could have a conversation in natural language, understand context, be personable, work seamlessly with built-in apps, and have flawless dictation across the phone. Although the use of Voice Assistants on mobile devices continues to rise, I would love to know how many users would say Apple achieved their goal even today with the Siri product.
Missing the mark on satisfying user expectations had repercussions for Apple and the entire voice technology industry. This miss on meeting expectations can explain why designers and developers would not explore voice technology in-depth as a viable feature of building new user experiences. Would you eagerly start to develop new voice technologies after the perceived struggles Apple had with the launch of Siri? Factor in that 2011 was a time of ample opportunities in technology outside of voice technology and you can quickly see how voice features could become an afterthought.
Alexa Launch & Focus on Far-Field Innovation
Amazon’s launch of the Alexa Echo device in 2014 reinvigorated the conversation and excitement around voice technology. Amazon was truly innovative with the Echo smart speaker and did not have any direct competitors in the space when it launched. Given Amazon’s dominance in E-Commerce, it was obvious why the Echo speaker could be valuable to the company for providing a new way for users to shop. However, the launch also sparked questions on what other user experiences these devices were capable of enabling and what they would be capable of doing in the future.
Amazon was a pioneer in Far-Field Voice Technology that enabled conversational experiences to work on smart speakers. The Echo devices had 7 microphones and benefited from technological advancement in areas like wake word recognition, beamforming, noise reduction, and echo cancellation. This innovation reinvigorated the conversation with voice technology, however, it is important to understand the intricacies of this conversation vs the launch of Siri. This new conversation comes from two perspectives: A) Far-Field experiences and B) Conversation as the dominant modality, or Voice First, since the experience is done completely through a screenless speaker device.
Innovator’s Dilemma in Voice Technology
I believe the voice technology efforts of the last decade have created an Innovator’s Dilemma in the larger Voice Technology space. The Innovator’s Dilemma is a book written by Harvard Professor Clayton Christensen. The theory he presents in the book is that incumbents often are the ones to spot and develop new technologies while easily reorganizing themselves to do so. The problem is they fail to value new innovations properly because incumbents attempt to apply them to their existing customers and product architectures — or value networks.
Voice Assistants and Smart Speakers have been a fantastic vessel for taking advantage of advancements in Far-Field voice technology and re-engaging the consumer market on what is possible with voice technology. Does this mean that smart speakers are the only viable channel for businesses and product teams to take advantage of voice technology? No! Being mindful of the Innovator’s Dilemma, it makes sense why so much attention has been given to smart speakers from the forefront. The Fire Phone launch was a failure, and the end-user did not like how obvious the marketing was that the goal of that phone was to make it even easier for users to spend money with Amazon. This context incentivized Amazon to look outside of the smartphone domain when building Alexa, but still try to create a product that could extract value from the existing juggernaut E-Commerce business. This model of trying to fit innovation into projects that drive ROI for Big Technology companies does not keep the 3rd party developer ecosystem that supports these platforms at the forefront. Understanding this begs the question, are there other domains that could be better suited for voice technology innovation that are not constrained by Big Technology business models?
Opportunities in Near-Field Voice
Amazon took advantage of Far-Field voice technology innovation to create smart speakers. However, voice technology across the board has been continuing to improve since we became fascinated with speech recognition back in the 1950s. Alongside this advancement has also been the democratization of access to the underlying voice technologies that enable platforms like Alexa or Siri to be built. I believe this access gives developers interested in what is possible with voice technology the ability to reassess platforms like mobile and web, where the industry still needs to define best practices for how to use voice as an interface.
I believe the industry has innovated with voice technology to a point that could satisfy the user expectations that were initially established with the launch of Siri in 2011. However, I would propose one slight tweak in the approach. Eliminate the need for back and forth conversation in all voice-enabled experiences and an all-encompassing assistant that controls the experience. Rather, product teams should lean into the rich graphical user interface that mobile and web domains provide and think of voice as the most efficient interface to these domains. This approach allows product teams to truly create a Natural User Interface that keeps solving end-user tasks at the forefront.