2018 was the year of the voice assistant. So what’s next?

Tobias Dengel
WillowTree®
Published in
6 min readJan 8, 2019

Last December, we made a prediction that 2018 would be the year voice went mainstream. A recent report from voicebot.ai bears this out conclusively, stating that there are now over 1 billion voice-assistant-connected devices in circulation, including smart speaker adoption in the US by 57.8 million users.

Furthermore, we predicted that the smartphone would remain the hub of a voice-connected experience for the average user. According to the same report, close to 70% of monthly active voice assistant users in the U.S. accessed their assistants most regularly through their smartphone in 2018:

Src: voicebot.ai

The numbers are clear — voice is here to stay. Here’s a look at some of the most significant advancements in voice in 2018, and our voice predictions for 2019.

2018 saw voice assistants becoming more naturally conversant than ever — both in terms of input and output. Advancements in Natural Language Processing (NLP) technology made it easier than ever for your voice device to parse your intent.

Just a couple of weeks ago, Amazon announced Neural Text-To-Speech (NTTS) technology which enables the Alexa assistant to switch between a number of different styles of speech based on context, intent, and audience.

Part of NTTS’ strengths for brands is the ability to set context for your voice skill’s interactions. That’s a big win for companies who have been waiting for voice assistants to be able to capture their brand voice and tone.

For instance, a healthcare app might opt for a voicing style with a sense of bedside manner while a broadcast news channel’s morning update skill can read the news with the authority of a conventional broadcaster (Amazon just rolled that last one out, by the way).

Being able to set a speaking style for Alexa is great, but wouldn’t it be even better if our voice assistants were context-aware — that is, able to parse context from the tone of the user’s request and respond accordingly?

Amazon has already demonstrated one limited version of context awareness this year with whisper mode, in which Alexa responds to a whispered command in kind, making the assumption that the user likely has a reason for keeping their voice low (so as not to disturb a sleeping loved one, for instance.)

If NTTS can allow Alexa to trigger a wider range of voices the way it triggers whisper mode — to learn to “read a room,” as they say — that’s a huge step forward for brands looking to have meaningful interactions with customers via voice assistants. What if Alexa could process a customer service request and respond in-kind to the tone/level of agitation in the customer’s voice?

Watching the Google Assistant book an appointment with a hair salon over the phone was one of the most talked-about demos from any of the major tech companies in all of 2018.

With the help of Google’s new recurring neural network (RNN), which they’re calling Duplex, users can now have AI schedule appointments, dinner reservations, and presumably a wider range of administrative tasks over time on their behalf, completely offline via Google Assistant.

Sending AI out into the world on your behalf to carry out tasks like this is a huge step forward. But let’s not lose sight of the ultimate potential for this technology: to do away with the booking call altogether.

It’s comical to imagine a scenario in which Google Duplex calls to book an appointment with a hair salon using a similar kind of neural network-powered bot to field calls and save the human employees of the company time and money, but the reality is that even this scenario betrays our human-centered view of these kinds of tasks.

The real power of AI lies in the ability to handle all of these tasks instantly and completely in the background. In the hair salon scenario, instead of dialing the salon for a negotiated conversation in real-time, the AI client would simply compare schedules and make an intelligent selection that works for both the client and the service provider without involving a single human in the exchange.

While this kind of thing is already possible today, we haven’t seen a widespread adoption on the service-provider end of the transaction just yet. But if Duplex proves popular and useful, we expect to see businesses of all kinds start to handle scheduling in this way.

This year we saw the breadth of voice-connected devices grow exponentially. Amazon’s unveiling of numerous Alexa-connected home products ranging from smart plugs to microwaves and their investment in a smart home building company demonstrated a clear play for dominance in the home-based voice market, an all-too-important move considering the company’s painful and complete absence from the smartphone market. Google made a similar move with a homebuilder in September.

In fact, just about all of the major tech companies made a play for putting their voice assistants at the center of users’ connected lives: Apple’s HomePod, Google’s Home Hub, and Facebook’s Portal were all released in 2018 as answers to Amazon’s flagship home devices, the Echo and Echo Show. Amazon, for their part, expanded the range of Echo devices, each with a context that favors a particular room in the home.

In terms of the sheer breadth of investment in voice and devices sold, Amazon leads the pack both in effort and sales (The Echo Dot was, by many measures, the best-selling device of the Black Friday shopping period this year).

Again, Amazon’s play for dominance in the home is built on the necessity for the company to circumvent the smartphone-as-hub approach to voice currently favored by Google and Apple. So far, this strategy is working, which might force competitors to pursue alternative avenues of relevance in the voice space.

For instance, Apple and Amazon have recently announced a partnership which will, for the first time ever, allow Amazon to directly sell Apple products (forcing out a great many un-authorized 3rd-party Apple dealers in the process). Late last year, Amazon’s Prime streaming app found its way onto Apple TV for the first time, and Apple Music just became available on Echo products (with availability for 3rd-party Alexa devices slated for next year).

This kind of making-nice could be a signal that the companies are approaching some kind of symbiotic relationship of devices and content, at least in the short-term, in order to put a squeeze on the common enemy they have in Google. But Google is finding ways to capture user share on its competitor’s devices, too: 2018 saw users access OK Google via Siri Shortcuts on iPhone, and Google Maps via Carplay.

Perhaps, then, the clearest thing about the tech giants’ voice strategies going into 2019 is that they’re still shaking out. We expect the picture to come clear in 2019.

  1. Clearer monetization paths for brands. As voice-enabled homes and multimodal voice experiences become the norm, brands will invest in new and exciting ways to reach consumers via this channel. For starters, what about sending push notifications via voice?
  2. Voice disrupts enterprise apps. Voice assistants could offer dramatic efficiency gains, from faster service calls to better warehouse automation.
  3. More brands will launch their own voice assistants. BMW already announced theirs, due out in March 2019, designed to carry out tasks directly related to your vehicle. As platform-based voice assistant options become more robust (BMW’s assistant was built on top of Microsoft Azure and IBM’s Watson, for instance), we expect to see more of these brand- and product-specific assistants on the market. We’ll also see brands begin to voice-enable their own iPhone and Android apps to provide this level of control, customization, and discoverability that Siri, Alexa, and Google Assistant simply can’t match at this point.

Is there something else you hope to see voice do in 2019? Want to talk about how voice makes sense for your brand? Feel free to reach out to us with any questions!

Originally published at willowtreeapps.com.

--

--