Illustration by Nikos Mavrikakis

Voice apps are websites.

Let’s say it again: voice apps are websites.

Braden Ream
Published in
6 min readOct 21, 2018

--

March 2019, and smart speakers are the fastest adopted technology in history — expected to hit 75% of Americans by 2020. The curve looks like this:

Courtesy of Google

We all should be paying attention to a curve like that, regardless of the industry we’re in.

Despite the adoption curve, many feel voice is a ‘fad’, or that it will go the way of VR in recent years. We hear that a lot at Voiceflow — and we totally get it. Many of us don’t have a smart speaker yet (Alexa or Google Home), or even if we do, we might rarely use it for more than streaming music or setting alarms.

At Voiceflow, we make it easy for any person or business to build Alexa skills without coding. Our goal is to give everyone the power of programming for interactive technology. Because of this, we think about voice, and the impact it will have, quite often.

We’re firm believers in a voice-forward future and have outlined some of our collective thoughts below on the future of voice.

Voice is an operating system, not an interface

Voice platforms like Alexa & Google Assistant are designed to be operating systems for your life — but they’re not a good interface (way to get information).

Platforms like Netflix, Facebook, and UBER are the types of companies you need to drive a technology ecosystem forward. These just aren’t possible yet at scale on voice. Think of voice applications like people. Let’s imagine for a second a voice app called ‘Fable’ being the ‘Netflix of Voice’. If you know the exact title of the movie you want, Fable will serve that to you faster than having to scroll & type on a computer. However, if you don’t know what you want and just ask Fable to find a ‘scary movie’, Fable will have to list hundreds of movies out to you by name with a short description after each one, then listen for your pick. With hundreds of items to choose from, this can become incredibly inefficient, very quickly. However, with a visual interface, this is a breeze as you can scroll through the scary movies list to see what catches your eye.

The finding? The amount of information that can be consumed at once with voice is incredibly limiting. Platforms at scale cannot be built on voice until there is a visual component added which allows for rapid information output to match the rapid input voice allows (e.g asking for a movie by name). Visual interfaces, powered by a ‘voice operating system’ are the future of voice — with Amazon & Google already creating screened devices powered by their voice assistants.

Voice is a democratizing technology

With voice, anyone can create the next big thing — something we want to help make happen at Voiceflow . We first found this when creating our own voice entertainment. For a fraction of the cost, we were able to make interactive voice stories rivaling, and eventually surpassing Universal Studio’s professionally made content. Because there is no visual component (yet), a person with just an idea and the tool to make it can produce engaging interactive stories & games for next to no cost.

Unlike recent technology platforms, voice has no visual interface. This means voice apps have, for the most part, only a backend — backends that are almost entirely just logic. This opens the gates for visual tools like Voiceflow to allow for the advanced creation of logic, without the limitations & restrictions that come with a visual interface. The best web apps & phone apps have always been custom because of the unique ways they combine the front, and backend to serve their role. With voice, however, interfacing with an application is always through a conversational manner, with is powered entirely with backend logic. This means with a tool like Voiceflow that allows for the building of complex logic — anyone, without any coding experience, could build the next Facebook or Netflix. With web or mobile applications, this would have been considered impossible.

Another democratizing feature of voice is the low cost per engaged user. Think of this. At Voiceflow, we’re already used in several US classrooms to tell engaging stories. Unlike movies, the kids can all play a role in the story, and the teacher can create the curriculum on our tool the night before (e.g a story about traveling the planets, or using math to solve a mystery). And unlike an expensive computer lab, or handing each child an iPad, the teacher needs only one device to engage an entire class because voice requires no physical peripheral — no controllers or separate devices. This means less hassle for the teacher, less cleaning for the janitors, less device maintenance for the schools, and most importantly — 1% cost of the traditional methods of classroom technology engagement. Voice is truly democratizing in the classroom.

Me chatting with a Teacher using Voiceflow

True mobility

Too many people grade voice against current technologies like mobile and declare it useless. Why have a back & forth conversation with your phone just to check your bank statement when you can open an app in one tap? Why play a conversational trivia game on your smart speaker when you can play Fortnite on your phone?

Voice is not a good interface for information transfer. If you have your phone in your hand, it’s almost always faster to just use its visual interface to access the information you need.

Where voice excels is on a metric of its own — mobility. Voice requires no physical peripheral, meaning if there’s ever a time where you can’t control a device like a phone, or a controller, then voice is far superior to previous technologies — it’s true mobility. One of the use cases we found fascinating when building interactive entertainment was bedtime — because unlike visual entertainment (books or screens), you could engage in the immersive worlds we built with your eyes closed.

Smart speakers drive voice adoption, not technology

Many may be surprised to learn that their beloved Alexa or Google Home device are not at all ‘smart’ speakers. In fact, for the most part, they’re simply a regular speaker with kickbutt microphones. What makes voice possible is not the speakers, but the artificial intelligence that powers them — and that AI does not live on the physical device. Every time you make a request to your smart speaker, the request is sent to some Google or Amazon data farm to be processed by their advanced AI, with the response being sent back down to the speaker.

Smart speakers are driving the adoption of voice technology and making it more mainstream, but they’re not what makes it possible. In 2018 and beyond, your microwave and clock can be just as ‘smart’ as your smart speaker.

How voice will change entertainment

We used voice actors frequently for some of our higher-end content when we were making children’s stories. Our reasoning was voice actors give the ability for a greater inclination of voice tones, and an overall more ‘human’ experience. However, as text-to-speech technology advances, the cost to produce this ‘higher end’ content on voice will dramatically decrease, potentially past the point where even the most cost insensitive creators choose to use text-to-speech over voice actors, if even just for speed & convenience.

Interactivity is the next phase for entertainment. Businesses over the past decade have been in a continual push to personalize the experience of their products to each individual user — so why not entertainment too? This shift in mentality can already be seen with Netflix now introducing interactive elements to their upcoming Black Mirror series. Voice, with its lack of a visual interface, and reliance of conversation to tell a story will further drive this change. We foresee the merging of visual & voice entertainment within 5 years — with anyone being able to create visual, but voice-powered interactive entertainment.

The role Voiceflow plays in voice

Our goal is for any person or organization to be able to create powerful voice applications without needing to code. We feel the democratizing power of voice is only fully realized when anyone can get past the hurdle of needing to create a voice app.

If you’d like to build your own Alexa Skill on Voiceflow, you can get started here: https://getVoiceflow.com

--

--