“Alexa, How Do I Build A Skill?”

Lessons from an Alexa Dev Day and tips on how to start building your own Alexa Skill.

The Amazon Echo surprised a lot of people when it was introduced in 2015. It was pitched as a physical device designed around your voice, an always on speaker that connected you to Alexa, Amazon’s voice assistant. At the time people were skeptical of Amazon’s non-Kindle hardware ambitions, having recently seen the Fire Phone flame out. Two years on the number of Echo devices has grown and developers have created over 15,000 Alexa Skills, a testament to Amazon’s ability to learn from failure and create compelling platforms. Skills are to Alexa what apps are to an iPhone, third-party applications augmenting the default experience.

Collection of skills from amazon.com/skills

The purpose of this post is to share learnings from an Alexa Dev Day, a workshop put on by Amazon to educate would-be skills creators. It is my hope that reader will gain a high-level understanding of how custom Alexa skills work before they dive into the more detailed documentation.

Echo vs. Alexa

Let’s first clarify the difference between Echo devices and Alexa. It is simplest to think of Echo devices as the hardware and Alexa as the software. Amazon’s goal is to make Alexa the voice assistant of choice within homes and as such they are quite happy to have third-party hardware manufacturers integrate Alexa. One can think of Alexa as the Windows of voice assistants, available on all sorts of hardware, and Siri as, well the MacOS of voice assistants. You’ll never converse with Siri from a non-Apple device.

Echo devices provide the best interface to Alexa. With competition heating up in the voice assistant space, it will be interesting to see how we converse with Alexa in the future, be it via an Echo or some other device. The rest of this post will focus on software side of the ecosystem, and specifically on how to build Custom Skills for Alexa, which give the creator the most control. Alternate types of skills include Smart Home Skills, for controlling smart home devices such as lights, and Flash Briefing Skills, for adding additional content to a user’s daily briefing.

If you’d like to read more about the Amazon’s business incentives in this space I recommend reading Amazon’s Operating System. To experiment with integrating Alexa into custom hardware, you’ll find this Raspberry-PI project interesting.

What happens after “Alexa”

“Alexa” is a wake word, a sound that springs an Alexa enabled device (referred to as “Echo” from this point on) into action. Whatever is said directly after the wake work is streamed to Amazon’s Alexa service for natural language processing. If the request is intended for a third-party skill, Amazon will deliver a structured JSON request to the skill’s associated web service. The skill can then return a response that includes Speech Synthesis Markup Language (SSML), resulting in Alexa dictating the response to the user.

It should be noted that custom skills do not have access to audio streamed to the Alexa service, instead they receive structured text outlining the request that needs to be handled. This is a nice privacy and security feature of the platform.

High-level sequence diagram of custom Alexa Skill interaction

To facilitate efficient delegation of requests to third-party skills, Amazon had to make some trade offs regarding sentence structure. While standard requests such as “Alexa, what time is it?”, or “Alexa, what’s the weather like?” will be handled directly by Alexa, users must include a name identifying the third-party skills they wish to interact with. Let’s explore this more using the Lyft Alexa Skill:

Alexa will parse the above phrase into four parts:

  • Wake Word - As we’ve seen this is what activates Alexa.
  • Launch command - Common phrases recognized by Alexa’s built-in invocation model. They can vary based on type of skill being requested. For example, “turn on the” will be recognized by the Smart Home Skills API.
  • Invocation Name - The name of the skill to be accessed, this is configured by the skill creator.
  • Utterance - Phrase that Alexa’s Automatic Speech Recognition (ASR) will parse and map to a supported ability, known as an Intent, of the Lyft skill. I think of intents as types of requests a skill needs to support, this example might map to the Lyft skill’s RequestIntent.

Building Your First Skill

There are two sides to an Alexa skill, a frontend which controls the voice interface, and a backend, which handles requests from the Alexa service. The frontend is configured via developer.amazon.com and the backend can be any web service, although AWS Lambda functions are most commonly used in tutorials.

In this section, we’ll configure a local business recommendation skill which Amazon provides, it’s called Gloucester Guide and supports phrases like this:

Alexa, ask Gloucester Guide for coffee recommendations

Configure Skill
The first step in configuring a skill is to create an account on developer.amazon.com. This is a management console for Alexa skills, a place to configure, monitor, and publish skills. Once your account is created, click Add a New Skill under the Alexa Skills Kit section. Note, since we’re not integrating Alexa into our own hardware we’ll be ignoring the Alexa Voice Service section.

Get started with “Alexa Skills Kit”, this post doesn’t use “Alexa Voice Service”

Skill Information
The skill type should be set to Custom and Gloucester Guide should be entered for both name and invocation name. All other settings can be left in their default state.

Interaction Model
This section can be a little tricky since the Skill Builder interface is currently in beta mode. Upon launching the builder, we need to navigate to the Code Editor section and replace the existing code with the contents of InteractionModel.json from the example repository.

Screenshot of Skill Builder’s Code Editor interface

Once the code has been changed, click Build Model and then Save Model. Note, as of this writing the build process can sporadically fail. This was confirmed as an open bug during the Dev Day and if encountered it’s best to retry.

Once the model has been built the interface will show additional Intents, such as Breakfastintent, reflecting the planned interaction model for this skill. A full exploration of the Skill Builder interface is beyond the scope of this post but it will enable the creation of more advanced interaction experiences.

Create Lambda Function
Now that skill has been configured it’s time to create an AWS Lambda function that will respond to requests from the Alexa service. An active AWS account will be needed for this step.

On the Create function interface choose the alexa-skill-kit-sdk-factskill blueprint. This will automatically include some required libraries and save time.

Lambda function blueprint that will be used

The function should be called gloucesterGuide and given a basic lambda execution role. Full details on setting up a lambda role can be found here.

Once the function is created we’ll want to replace the default code with the contents of index.js from the example repository.

We now need to add a trigger to the lambda function. Conveniently there’s already an Alexa Skills Kit trigger:

Connecting Skill To Lambda Function
The final step is to connect the frontend and backend of our Alexa skill. This is achieved by copying the ARN (Amazon Resource Name) of the lambda function into the configuration settings of the skill. The ARN is available from the function overview screen:

Once copied, the ARN should be pasted into the configuration section of the Alexa skill:

Test The Skill
Under the Test section of the Alexa skill interface we can use the Service Simulator to test our skill, typing in Coffee will result in the following output. Click Listen to hear Alexa dictate the response.

This simulator makes it easy to explore the structure of the JSON requests and responses that power the skill. It’s interesting to examine the Speech Synthesis Markup Language (SSML) contained in the skill responses, particularly the use of interjection in order to customize how Alexa provides additional information about a business:

Enjoy your meal! <say-as interpret-as="interjection">bon appetit</say-as>'

This Alexa skill can be deployed to Echo devices with some additional configuration, which is left as homework for the reader.

Takeaways

I have been intrigued by voice assistants for some time and have owned several Echo devices, while also building my own using a Raspberry-PI. Only time will tell whether Alexa, and other voice assistants, can expand beyond their most celebrated use cases, such as controlling smart home devices or answering a trivia question. I personally find the application of Alexa in the realms of accessibility and education most interesting. Could Alexa help school children with homework in a manner that is less distracting than screen based alternatives? To its credit, Amazon is attempting to fund development in alternate use cases by compensating the creators of the most engaging skills.

My own attempt to build an Echo using a Raspberry-PI

I hope this post has provided a helpful introduction to Alexa Skill development. It will be interesting to see how the Alexa platform evolves. In the meantime two resources I plan to explore are:

  • The Alexa Skills Kit Command Line Interface (ASK CLI), which allows developers to manage and test skills from a command line. This tool will prove indispensable as developers attempt to build more complex skills.
  • The Alexa Cookbook provides a large amount of example code, including code on how to take advantage of more advanced features, such as persistence of data across sessions.

My thanks to the Amazon team that hosted the Dev Day, it was a fun experience and they did an excellent job.