We’ve heard about voice technology for a long time. There are voice technologies going back as late as the 70s, and a big question that we need to discuss is “why now?” The answer: it actually works.
It’s all about the cloud
The first reason that voice is really taking off now is because of the availability of the cloud, where so much of voice processing has to take place. Voice processing needs a very large vocabulary and a lot of neural net type processing. If you don’t have good internet and broadband speeds, you’re not really able to use voice control well.
Check out the chart above. On the left, we see in 2000 only 15% of Americans had access to fast broadband internet speeds. You can’t have good voice control if that few people have access to fast internet. 10 years later, that number went up to 94%. Keep in mind that this chart only shows 2000–2010, and we are now 2016. So if you think about voice control prior to 2010, the internet wasn’t really good enough to do this.
The second reason voice technology is because of advancements in accuracy, and voice control requires a very high degree of it.
Per the figure below, back in the 1970s, we had voice control systems that could have a small number of words hardcoded, maybe 10 or 20 words for example. Robotic mechanisms, such as “lights on” or “lights off,” are what we heard.
If you fast forward, even up to the early 2000s, you start seeing around 80% accuracy with voice technology. This means 1 in every 5 words is wrong. You can’t control your home with voice if 1 in every 5 words is going to be heard incorrectly. So if look at where things are going, it’s only in the last 12–24 months that we’ve actually broken through to the 95%-98% accuracy benchmark.
For example, consider the experience around Siri. When people say Siri is not that good, it more has to do with timing. Siri is actually amazing, it just came in late 2011 (only around 70% accuracy). We weren’t nearly good enough when Siri came out. Today, in 2016, only now — and only in about the last 12–24 months — is when we can even handle voice technology at the level that we need to be able to intuitively talk to it.
Also keep in mind, this is in English. Other languages are about the same and catching up. Regardless of anything else, if you don’t have good language and vocabulary recognition, voice technology is just not going to work.
User adoption is critical
One other thing that’s really interesting to note is the way that we interact with our home — the technology that we have — really follows personal computing. If we’re using a personal computer, such as a laptop, you can put the UI on that machine and people will use it. If you’re asking people to learn a new technology or adapt to something just to control their home, it won’t really happen.
So if we look back and follow personal computing, what you’ll notice is the way that we interact with our home is following this trend almost linearly. It is nearly spot on.
When we talk about how to use voice for home control, we have to make sure the technology allows for a natural and intuitive experience. Right now, for example, people are excited about wearables in the home. I’ve heard requests to build functionality for a smart watch that has a microphone, which sounds like it would make sense. I, for example, am wearing an Apple watch right now. The reality, though, is consumers aren’t adopting those products now. If they’re not adopting the products, asking them to use these products is going to feel more like a handcuff to them.
The trend that we’re really watching right now is hands free microphones, such as the Amazon Echo. If these really do take off and keep growing at the pace that we’re seeing, people are going to have them in their homes anyway. If the UI and the device is already there, we can control the home with it.
So as home technology professionals, we need to think about not reinventing the way someone lives in the home. This is especially true as a manufacturer. We’re not inventing some new technology hoping that people will all of a sudden start to use. Instead, we’re adapting to the way users already work with technology, such as a personal computer, and trying to create a natural environment in which they can live.
Be sure to check out our next blog post, where we explore why the best user interface may actually be no interface at all.
This was written by Alex Capecelatro, Co-Founder & CEO of JStar. Previously, Alex was a research scientist for NASA, Sandia National Lab, and the NRL. Before that, Alex worked at Fisker Automotive and founded At The Pool and Yeti. Alex has an engineering degree from UCLA, lives in LA, and likes to tweet about AI, Startups, and Design.