Voice activated services for busy people 101.

Chas Sweeting
voiceflow
Published in
6 min readSep 7, 2017

If your business hasn’t even considered voice-activated services as debuted with Amazon Echo and Google Home devices, this primer is for you.

1. Alexa Skills and Google Actions

Google and Amazon provide APIs and SDKs for third parties to create their own voice-activated services. (Amazon refers to such applications as “Alexa Skills” and Google calls them “Actions”.) This is to voice what allowing third party developers to build native apps for iOS was for the iPhone — a pretty big deal.

Google currently trails but is adding more spoken languages and will piggyback Android to wider global distribution, so expect them to soon catch up.

Although independent developers and media companies were the first to jump on board (as evidenced by the chart above), more companies are now making their services accessible by voice. The pressure will soon be on their peers (that’s you) to follow suit or losing business.

2. The market is set to take off

Today, Amazon’s Alexa powered devices are sold in the U.S, UK and Germany though you can order them online and use them anywhere in the world. RBC Capital Markets is predicting 500 million active users for Amazon by 2020 and a recent survey of current Echo users found that 17% already use it for online purchases.

Google will reach even more people. Launched later, Google Home is now in the UK, Australia, Canada, France, Germany, and Japan. However, Google Assistant (the underlying natural voice technology) will be able to speak Italian, Spanish, Korean, Brazilian by the end of the year too, and enjoys the advantage of distribution with Android.

Meanwhile China’s triumvirate of Baidu, Alibaba and Tencent have entered the fray with their own voice-activated AI in Mandarin.

Finally, Apple’s HomePod hits the shelves this December.

3. How the technology works

Keeping this high level, and using the example of an Amazon Echo device:

  1. Your device sits on the sideboard at home and is always listening — waiting to detect the ‘wake word’ (which is ‘Alexa’ in the case of the Amazon devices, and ‘Hey Google’ for Google Home).
  2. When you say ‘Alexa’, audio is streamed from the device to the Alexa cloud which translates the natural voice to text and tries to determine the ‘intent’.
  3. When a user requests information from your custom Skill/Action, the request is then sent to your application which returns an appropriate text response.
  4. The response is converted back to speech and is heard on the device at home. If you have multiple Echo devices in the same building, several may detect your voice but only the nearest one will respond thanks to Echo Spatial Perception (ESP).

The key thing to note here is: the “heavy lifting” is done for you.

Google and Amazon have taken the heavy lifting out of natural language processing. They’re doing the AI magic to translate natural voice to text & intent with over 90% accuracy, allowing you to focus on developing applications & services. You don’t need experts in data science and AI to start developing voice-activated digital services. Yes, you really CAN create applications for voice.

4. User Experience design will be different

This should be obvious really — it’s voice and audio after all. No graphics, no images, no colours & fonts, no labouring over CSS and browser quirks. It’s honestly liberating!

(That they chose the word ‘utterance’ to describe human speech is humorous but also quite fitting: one of the biggest challenges of designing voice applications is accounting for all the different variations people could — and do — use. )

Instead, you get to focus on the message and communication. There’s also a new design vocabulary for conversational UI. For example, Alexa’s nomenclature for spoken commands includes the “wake word”, “activation name”, “utterance” and “slots”:

Move aside, HTML. We now have the ability to fine-tune responses with SSML (Speech Synthesis Markup Language) — to specify exactly how words are said. You can even control the speed and volume of speech, while the linguists in the room can knock themselves out with individual phonemes.

After two decades of web and mobile UX, this stuff is positively fun!

5. Different skill sets required.

Just because this is voice, please do NOT leave this to whoever designed your deplorable call-centre IVRS. Ideally, you want somebody who knows application development and user-experience, but with a great command of language. Yeah, the copy/content-oriented person in your UX team finally gets their day in the sun.

That’s not to say your interaction designer who’s built his career on pixel-perfect web templates and custom easing will be redundant — their core UX competency should always have been the ability to create a simple, intuitive user journey. If however words are not their forte, look elsewhere.

6. Omni-channel content just got bigger

This isn’t like making your website responsive for mobile viewports. Whilst technically you could just feed your Skills/Actions the same content from your website CMS, in most real-world circumstances that would probably suck.

Content will be required in smaller snippets, addressing specific requirements. If you’re working with chat-bot tech at the moment, you’ve at least started thinking in terms of conversational dialog.

7. Smarter voice UX is hard but worth it.

Applications which provide more precise answers to specific questions will distinguish themselves. On the web, you get away with sort-of-not-knowing-what-the-user-wants-exactly and providing a page of text for the user to skim and hopefully extract the actual information they require. In short, you get away with just chucking the content online. It won’t work with audio.

Recommender systems are going to become even more important, as will the ability to extract precise intent. As you seek to develop smarter, more intuitive applications, NLP (natural language processing) experience in your development team will definitely help.

It’s the difference between your system being able to comprehend a request for a “flight from Dubai to Hong Kong, leaving tomorrow afternoon returning Friday morning, preferably with Cathay” and every permutation thereof … or having to ask the user several tedious questions, each eliciting one additional data point (“where are you flying from?”, “where would you like to go?”, “when would you like to leave?”, “when do you … “ <ugh>.)

8. You’re not limited to just text-to-speech.

News and media companies have jumped on this: the 2–3 minute Flash News from news providers are among the most popular Skills on Alexa. Good Alexa game developers incorporate sound effects to add atmosphere to their otherwise voice-driven role-play games.

Expect top design firms to put sound designers to work, creating custom sound effects for their clients … and many other companies to overuse cheesy stock sound effects like it’s 1996 all over again.

Some thing are just better communicated visually — like a long list of ingredients for a recipe, or the performance of a stock over the past week. Companion screens like Amazon’s Echo Show or even the smartphone provide the ability to send back images too but please use companion screens sparingly in your Skills/Actions. Build for the lowest common denominator : assume that the user only has audio input/output.

9. Think ‘experience’ or ‘service’ , not advertising.

One of the great things about voice is that marketers haven’t ruined it yet. Though it’s not for lack of trying — three months ago Amazon shut down the first ad network developed for Alexa Skills.

Instead of looking for ways to advertise and tell the world how great you are, you have a remarkable opportunity to use this new channel to create new services or to simplify the customer experience. (Amazon has done this with the shopping list for example — say “add butter to the shopping list” whilst you’re cooking and it’s added, extending the purchase funnel and making it even more slippery).

10. Finally, this is an opportunity to bring your brand to life.

Surprise and delight, have some personality, bring your brand to life.

Ask Alexa what she thinks of Jeff Bezos and she’ll reply with either “he’s prime” or “I rate him 5 out of 5 stars”. That’s sass, that’s smart, that makes me want to engage with a company more.

“Alexa, what do you think of Jeff Bezos?” “He’s prime.”

Originally published at https://www.linkedin.com on September 7, 2017.

--

--