This morning, I had the amazing fortune of being able to speak at the VOICE Summit at NJIT. The VOICE Summit has turned out to be one of the largest conferences on Alexa and building voice interaction, with 1,500+ attendees, hundreds of speakers, and over a hundred sessions.
There has definitely been a large evolution in voice from when UCIC started six years ago. Back then, voice was an oddity and many investors we pitched to didn’t see the idea. “Why use your voice when you can just walk over to a light switch?” asked one VC.
Today, millions of households in the US and around the world have an Echo or Google. Besides Google and Amazon driving the market through their huge advertising engine and sales channels, the other main drivers are the commoditization of the technologies behind the scenes that make ambient voice interaction possible. These include voice trigger / local speech recognition technologies, far field digital signal processing chips, speech recognition engines, natural language understanding engines, and speech synthesis APIs. What might have cost millions of dollars in implementation and contribute to $15 in bill of material costs five years ago can now be done for tens of thousands of dollars and a third of the BOM cost.
What developers of Alexa Skills (and Google Actions) need to think about is a few new modalities of interaction:
- AI assistants will be in third party hardware more than they’ll be in hardware sold by Amazon or Google
- More visual interfaces will be used in conjunction with voice
- New interfaces will be available for interaction that include gesture, projections, AR/VR, and touch
- Interfaces will disappear and become ubiquitous. They’ll blend into the background
- Alexa Skills will be able to coordinate with hardware, similar to Alexa Buttons and Alexa Gadgets API
- More data may be available to use such as authentication and information on gender/age/ethnicity etc.
- Voice interaction will become ubiquitous and continuous. The idea of your Echo will be obsolete as it more has to do with your Alexa that can be accessed through biometrics.
The shift for developers will be not to build a simple Skill that mimics an app but through voice but to design an experience that builds on the intelligence that will be embedded everywhere.