What does your Brand Sound like? Voice Technology in Mobile

Published in

Midwest VC Musings

5 min readSep 10, 2018

Prior to venture capital I ran mobile for RetailMeNot, a consumer deals site that had over 28M app downloads. From that vantage point I had the unique ability to see lots of data around how consumers were using apps. One of the more interesting mobile UX technologies I had high hopes for, but that only reached an inflection point after I left product management, is voice.

Voice holds massive potential to reduce friction via “hands-free”, efficient interactions, which should improve task conversion rate and user satisfaction. It represents a new interface the same way windows icons or touchscreens did. Inevitably, the same way websites needed to be redesigned for SEO or optimized for mobile form factors and “touch”, voice optimization UX redesigns are coming. In the future, it won’t be strange for big brands and startups to have to think about what their brand “sounds” like (e.g., Is the voice a celebrity? Male or female? Accented? What is the tone & tenor?)

Why now?

Voice in mobile is at a tipping point now due to the amalgamation of 3 things:

AI (computing capabilities) — Advances in AI & Machine learning capabilities and tools are enabling voice algorithms that were previously not easy to build.
iOT (device penetration) — Not only does everyone have a phone equipped with a mic & speaker, but smart speaker penetration of U.S. households is strong. Today 50M U.S. adults have smart speakers.
UX (interface acceptance) — Voice is intuitive, it’s actually a more natural behavior than learning to type. As smart speaker device penetration has increased, we’ve seen this proven out, and internet videos have abounded of toddlers talking to Alexa to play music or their favorite video. Voice as an interface is being rapidly adopted, Google says 20% of searches are already voice searches and Comscore predicts 50% of searches will be voice searches by 2020.

What role do the big tech companies play?

Over the past few years the big tech companies have laid the foundation for voice, building platforms and supporting developer tools. All the major tech companies & phone hardware companies have their own play to own the ecosystem of apps and experiences, though the smart speaker market is already a duopoly (Amazon has 60% market share, followed by Google at 27%).

Amazon has a strong lead, and has arguably done the best job building a platform and app ecosystem. There are already 15,000 Alexa tasks and their developer platform is easy to use. Last Fall Amazon launched Alexa for Business (which probably deserves its own article). Due to the ‘winner takes all’ dynamics in marketplaces (developers want to build apps where the consumers are, and consumers want to go where there are lots of experience-enriching apps) the stage may already be set in voice for Amazon and Google to be the platform winners, but Apple does have 86M U.S. iPhone users, so it’s hard to count them out. Microsoft (Cortana), IBM (Watson), and Samsung (Bixby) are all further behind, and will likely need to work out integrations with the dominant platforms (e.g., Google & Amazon) to be most relevant.

Examples of Voice Technology in the Wild Today

There are interesting examples across industries of voice being leveraged today:

Automotive — OEMs were early to this game. For safety reasons hands-free OnStar, entertainment center and phone operations have long utilized voice.

Consumer / Retail — The most obvious example is that Alexa has made it easy to order items from Amazon using voice, but a host of other ‘smart’ devices (e.g., TVs, refrigerators) are making their way into homes with voice as an interface. This series of Samsung Bixby commercials does a great job illustrating this point.

Finance — Capital One has developed apps for Alexa and Cortana to allow bank customers to check their balance, British bank Santander lets customers make payments via voice in its iPhone app and UBS wealth-management clients in Europe can ask Alexa for the chief investment office’s answers to financial and economic questions.

Healthcare — The Auvi-Q is an Epi-pen alternative that uses voice instructions to walk a user through how to administer a dose.

Hospitality — The Wynn Las Vegas uses Echo devices to control lights, play music, and more.

Pharma — Apprentice combines AR and voice to allow lab workers to work hands-free, improving efficiency and accuracy.

In essence, all the examples above can be boiled down to 3 core use cases for voice today:

Voice to text — Better known as voice as an alternative to typing. Every mobile keyboard has a microphone icon to enable this, and mobile search bars are also adding mic icons and training us to expect this as a welcome alternative to tapping out words one letter at a time on tiny keyboards.
Question & Answer — Voice works best when a question has a single correct answer (e.g., What is the weather today? What time does this store open?). Voice alone is less well suited to answer complex questions (e.g., What is the best wine?) though as AI is trained, you can imagine a question sparking a series of follow-up questions to make / convince a user a single recommendation is correct. Otherwise a list of options could be shown on a screen (e.g., list of related articles), it’s worth noting that Amazon’s Echo Show takes the approach of screen + voice and all mobile phones obviously pair with screens.
Limited input / output actions — Very simple 1:1 actions can be managed by voice today (e.g., data entry, scheduling a meeting, ordering items, asking Alexa to play you a song or read you the newspaper). Similar to #2, there can be only 1 correct action, too much complexity or follow-up question clarification required and voice will become inefficient vs. touchscreen / menu alternatives.

And as with every interesting new technology, the best minds of our generation will come up with ways to build ad tech on top of voice. While the major tech players will likely own this ecosystem (and Amazon notably shut down an enterprising startup that tried to build the first Alexa ad network themselves), there should still be an opportunity for startups to define new ad formats and certainly build developer studios and agencies to help big brands define their strategy for working with this new technology.

Building something interesting in voice technology? I’d love to hear about it.

What does your Brand Sound like? Voice Technology in Mobile

Why now?

What role do the big tech companies play?

Examples of Voice Technology in the Wild Today

Written by Sonia Sahney Nagar