Having a conversation with Alexa

One of the best things I did recently was to attend a half-day workshop on how to build an Alexa skill. Having a web/mobile mindset, I found it interesting to see how you would design and develop for voice.

In web/mobile, you click on the OK button and it does something. It’s easy and consistent for the user — happy days! With voice, it’s a completely different interaction because you have to think about having a conversation, and as we know, every conversation is different. And you have no drop down menus!

In the workshop, we looked at how it works and we looked at building skills with Alexa Skills Kit(ASK), a collection of self-service APIs, tools, documentation and code samples that make it easy to create voice applications. It didn’t take long to see the excitement on peoples’ faces as they created voice-first experiences.

Overview of how Alexa works

  1. First, the user speaks to device
  2. Audio streamed to Alexa service
  3. The Alexa service converts the voice to text
  4. A request is sent to skill
  5. Skill contains response and the process is complete.

Help Me

One of the guys at the workshop told us about his dad, who had suffered a heart attack and didn’t have his phone with him. If he could have spoken to an Alexa device, it might have saved his life. So this guy developed an emergency skill called Help Me, making similar circumstances less tragic It could have worked like this:

User: Alexa, Help Me.

Alexa: Do you need help?

User: Yes.

Alexa: Calling the Ambulance.

You could have other options such as calling a friend, family member, next door neighbour, police, and more etc.

Travel Helper

My idea was a travel helper skill that helps you find out the costs of going on holiday.

User: Alexa, open Travel Helper.

Alexa: Which country are you going to?

User: France.

Alexa: The average cost of accomodation is X and the average price of a meal for two is X.

Other ideas included:

Bin Day — Find out what colour bin you need to take out! This is a very popular skill in the Amazon Alexa Store.

Hipster Coffee — Find out where your nearest hipster coffee shop is

Truth, Dare or Promise — Based on the game played at school. How it works:

Truth: Ask somebody something and they have to answer truthfully.

Dare: Dare somebody to do something. 
Promise: You have to tell the players a secret and they have to make a promise not to tell

Study Time — Time tracker for students.

When developing voice first applications come up with applications that are easier, faster, more natural, using your voice then doing it on the web or mobile platform

My Thoughts on using Alexa

Great for turning the heating on!

If you live in the UK for 11 months of the year, you’ll soon want an easy way to turn the heating on. Alexa is great for getting that done. You just need to say something like, “Alexa, turn the thermostat to 22 degrees.” It’s great for simple interactions like that. It’s not great for conversations. So if you’re designing a skill, keep the interactions brief and to the point.

It’s a lot of fun

Voice is the most natural way to connect. It’s fun and elaborate. It’s not like designing and developing websites, where you have limited screen estate and customers have only one or two buttons to choose from. You have to get things right the first time. Its requires a new UI knowledge.

No personality

You know that guy you meet at a networking event who talks about nothing but accounting for half an hour? That’s Alexa. It gets very boring after a while.

You could try giving Alexa some personality by hiring a voiceover artist, but that could be expensive in the long run and not everyone will like that voice. However, you could also jazz things up by varying the responses. For example,

You: Alexa turn up the heat

Alexa: Aye-aye captain!

If the customer is likely to make ambiguous statements, then as a designer or developer you need to add some ambiguity into your skill. An exchange like this could happen:

Alexa: How was your day?

You: Yeah, whatever.

Alexa: Okay, tomorrow will be good.

Alexa hates your accent

Alexa is a very clever thing, but she is not great on accents yet. Someone from Jamaica will make different sounds to someone who lives in South Africa. Have no fear! Amazon provides slot confirmations. Alexa will ask a simple question like, “Are you sure?” which gives you an opportunity to confirm. This prevents any major miscommunications from misunderstanding someone’s pronunciation.

Some eye-popping stats

  1. 30m+ Echos sold
  2. 58 million smart speakers forecast to be sold in 2018
  3. 75% of US households with smart speakers by 2020
  4. 50% of ALL search queries made through by voice 2020
  5. 200 BN voice searches per month by 2020
  6. 43% smart speaker owners would be interested in using skills from companies or brands they follow on social media

Building a Travel App Skill

Start with the customer and work backwards.

What value do you want the customer to get from using your skill? For the Travel Helper skill, I listed possible use cases, then selected three to get started. I also had a look at what everyone else is doing by going to the Amazon Alexa store.

Some ideas on what users might want to check:

  1. Average daily price for travelling in a country
  2. The average food price for one day
  3. Hotel prices
  4. Local transportation (taxis, local buses, subways)
  5. What to tip waiters or guides
  6. Alcohol (price of beer or wine)
  7. Cost of a cup of coffee
  8. Cost of a bottle of water

Try this first!

Have a look the Alexa store, which has 30,000+ skills already — chances are something similar to your idea already exists, so read the reviews and see what people are saying.

Designing the voice interface

A great way of designing the voice interface: go to a friend and have a conversation. Don’t tell them about the skill, but listen to what phrases they say and use them in building your skill (no fart apps, please!).

When designing for the voice, a good idea is to start with Paul Grice’s Cooperative Principle. According to Grice, there are the four main ideas we all try to adhere to in conversation.

  1. Give the most helpful amount of information.
  2. Do not say what you believe to be false.
  3. Be relevant.
  4. Put what you say in the clearest, briefest, and most orderly manner.

Ref — Grice, H. P. (1975) ‘Logic and conversation’. In P. Cole and J. Morgan (eds) Studies in Syntax and Semantics III: Speech Acts, New York: Academic Press, pp. 183–98.

This is where the hard work starts!

Open up Google docs and start thinking about the conversations you’ve had to create some sort of rough script. This is where the hard work starts. You need to come up with an invocation name, slots, intents , utterances and a script. If you don’t spend the time to do this it will make for a poor user experience.

Invocation name: This is the trigger word that tells Alexa the user is trying a activate the skill.

E.g, “Alexa, start Travel Helper.”

Intents: These represent the actions you skill performs, based on what a customer says. There are two types of intents, custom ones and the built-in intents that Amazon already provides. For the travel skill, I created a travel costs intent called TravelCostsIntent.

Slots: Also known as arguments, slots are used to extract specific pieces of information. So for a travel skill, you could have slot for the county. For example, “I want to go to <country>.”

Sample utterances — These are the phrases the user could say to interact with the Travel helper skill. This list could get very long indeed! Start with the simple ones, as a general rule make sure the sample utterances pass the one breath test. The more utterances you have the smarter your Alexa skill will be

Amazon has a built-in slot called Amazon.Country so you don’t have to manually type in all the countries yourself.

Example Sample Utterances for Travel Helper Skill

How much does it cost to go to <country>

I want to go to <country>

How expensive is <country>

How much money do i need to Travel to <country>

<country> costs

Going to <country>

How much is it to go to <country>

How much do I need to go to <country>

Tell me how much I need to go to <country>

Next, create a very simple dialogue script

No need to write War and Peace! For example, the Travel Helper skill could be like this.

User: Alexa open travel helper

Alexa: Welcome to travel helper. Tell me what country you are going to, I will tell you how much you need to spend on average.

User: I want to go to France

Alexa: $175 is the average daily price for traveling in France. The average price of food for one day is $36. The average price of a hotel for a couple is $20

Next, do some role-playing with friends and family pretend to be Alexa. Tell them what the skill is going to do, don’t tell show them the script. Hear what they say then refine your dialogue.

Development, Testing, Analytics

Will talk more about this in another blog post

Earn Money!

Once you bring your idea to life and your skill is really popular, you can earn money. Developers also receive other perks from Amazon like hoodies, t-shirts, Echo Dots and $100 in promotional credits.

Two new voice monetization opportunities are available:

Alexa In-Skill Purchasing: You can sell premium content or digital subscriptions within your skills. For example, if you have a history quiz game, you can offer a user a chance to purchase more question packs.

Amazon Pay for Alexa skills: Customers can purchase goods and services in your skill using the information already in their Amazon account. No need to remember username or password! This would be great if the user wanted to something like buying movies tickets quickly.

I don’t want to make this into just an Amazon lovefest — make sure you check out the Google Assistant and Microsoft Cortana platforms as well.

Where are we going with all this?

Voice is the new interaction medium for users. It’s best to start with users, not computers. If you have an idea for a skill create a script and try it out on friends and family.

As a general rule, when it comes to the dialogue, cut half of it out. Avoid having too many choices. Write for the ears, not the eyes. Try to make it a great experience by having different responses. Finally, don’t assume the user knows what to do.

If all goes well and users get value out of your skill, then hopefully Amazon will start writing you a check or two or at least send you a hoodie and some AWS credits.

Useful Links

How the BBC prototype for Voice

Learn what you can build with Alexa

Example code

AWS Credits

Tips on Promoting your Alexa skill