Understanding Amazon’s Alexa and Building Alexa Skill

Ashish
7 min readJun 12, 2017

--

Amazon Alexa

Alexa, Amazon’s cloud based voice service powers voice experiences on millions of devices including Amazon Echo, Echo Dot, Amazon Tab and Fire TV devices. Alexa provides capabilities or Skills that enable customers to interact with devices in a more intuitive way using voice. Example of Skills includes the ability to play music and answer general questions set an alarm or timer and more. The Alexa Skill Kit (ASK) is a collection of self service APIs, tools, documentation and code samples that make it fast and easy for you to add Skills to Alexa.

Alexa Skill :

An Alexa Skill consists of two main components, a Skill service and a Skill interface. As developers we need to code Skill service and configure the Skill interface via Amazon’s Alexa Skill developer portal. The interaction between the code on the Skill service and the configuration on the Skill Interface results in a working Skill. Lets try to understand each module in depth.

What is Alexa Skills (Image Source)

The Skill Service is the first component in creating a Skill. The Skill Service lives in the cloud and hosts code that we will write that receives JSON payloads from Alexa. Taking the Skill Service as the business logic for the Skill, it determines what actions to take in response to a user’s speech.The Skill Service layer manages HTTP requests, user accounts, information processing, sessions and database access. All these are configured in the Skill Service.

“Hey Alexa ! How does an alexa skill works ?”

Building A Skill:

Let us now try to understand how to use Alexa Skills Kit (ASK) and then build an Alexa Skill. This Skill is a voice driven application for Alexa. This Skill would be called the Greeter Skill and would say “Hello” to users when they invoke the Skill using the words (utterances) that we specify. This Skill would respond to users words with a greeting on any Amazon Echo or Alexa enabled device.

In this Greeter Skill being build, this is where the response “Hello” is generated and returned to Alexa enabled device. A Skill Service can be implemented in any language that can be hosted on HTTPS server and return JSON responses. We will implement the Skill in Node.JS running on AWS Lambda, which is Amazon’s serverless compute platform.

For the HTTPS server, AWS Lambda is a good option because it can be a trusted event source allowing the Alexa service to automatically communicate securely with AWS Lambda. It is possible to use your own HTTPS server but to do so it requires additional configuration to enable SSL and a signed digital certificate. No additional configuration is required with AWS Lambda.

NodeJS is a great option because it uses JavaScript which is a supported language on AWS Lambda and both Node and Javascript have very active developer communities. They are also convenient to develop in and debug.

Alexa Skills Kit (ASK) , Image Source

Skill Service:

A Skill Service implements event handlers, these event handler methods define how the Skill would behave when the user triggers the event by speaking to an alexa enabled device.

We define event handlers on the Skill service to handle particular events like the OnLaunch event.

GreeterService.prototype.eventHandler.onLaunch = helloAlexaResponseFunction;

var helloAlexaResponseFunction = function(intent, session, response){

response.tell(SPEECH_OUTPUT);

}

The onLaunch event would be send to the Greeter Skill service when the Skill is first launched by the user. Users would trigger this Skill by saying “Alexa, Open Greeter” or “ Alexa, Start Greeter”. Another type of handler a Skill service can implement is called an Intent handler.

var helloAlexaResponseFunction = functin(intent, session, response) {

response.tell(SPEECH_OUTPUT);

}

GreeterService.prototype.intentHandlers = {

HelloAlexaIntent” : helloAlexaResponseFunction

}

An intent is a type of event, there is an indication of something a user would like to do. In the basic Greeter Skill, all we have is one type of Intent, saying “Hello”, we call it the HelloAlexaIntent here.

Intent handler maps the number of features or interactions a Skill offers. A Skill service can have many Intent Handlers, each reacting to different intents triggered by different spoken words which we developers specify.

Skill Interface:

The Skill interface configuration is the second part of creating a Skill, where we specify the words that would trigger Intents of Skill Service defined above.
The Skill Interface is what is responsible for processing users spoken words. It handles the translation between audio from the user to events the Skill Service can handle. It sends event so the skill service can do its work. Skill Interface is also where we specify what a Skill is called. So user can invoke it by name when talking to a Alexa enabled device,

ex. “Alexa, ask Greeter to say Hello”

It’s the name users would address a Skill. This is called the Skill invocation name. For example, we are naming our Skill as Greeter.

Within the Skill Interface, we define the Skill’s Interaction Model.

Interaction Model:

Interaction model is what trains the Skill Interface so that it knows how to listen to user’s spoken words. It resolves the spoken words into specific intent events. You define the words that should map to particular intent names in the interaction model by providing a list of sample utterance. A sample utterance is a string that represents possible way a user may talk to the skill. These utterances are used to generate a natural language understanding model. This resolve users voice to our Skills intents.

Intent Schema:

We also declare an Intent Schema on the interaction model. An intent schema is a JSON structure which declares the set of intents a service can accept and process.

The Intent Schema tells the Skill interface, what intents the Skill service implements. Once we provide the sample utterances, the Skill interface can resolve the user’s spoken words to the specific events the Skill service can handle. An example is the “Hello World” Intent event in the Skill we are going to build.

It has the following syntax:

We will provide both the sample utterances and the intent schema in the alexa Skill Interface. When defining the sample utterances, consider the variety of ways the user might try to ask for an intent. A user might say, “Alexa, ask Greeter to say Hello”, or the user might also say “Alexa, ask Greeter to say Hi”, providing a comprehensive list of sample utterances to the interaction model. It’s important for making the user experience smoother by making the chances of a match.

After having set up the Skill interface with sample utterances to recognize voice patterns to match our Skill Services Intents, the fourth journey of the request, between Skill Interface and Skill Service can take place.

Example of a User Interaction Flow

Interaction Flow (Image Source)

Here is how user’s spoken words are processed by the Greeter Skill.

The user says, “Alexa, ask user to say Hello”, the Skill Interface resolves the audio to an intent event because we can figure the interaction model. We set up an Invocation name as Greeter and we provide the interaction model with sample utterances in the Skill Interface. The sample utterances list include “Say Hello”.

So the Skill Interface was able to match the user’s spoken words to the intent name.Now that the event is recognized by the skill interface it is send to Skill Service, the matching intent handler is triggered. The intent handler returns an output speech response of “Hello” to the Skill Interface which is then send to Alexa device. Finally the device speaks the response.

Skill Request Lifecycle (Image Source)

Summary:

An Alexa Skill is made up of Skill Interface and a Skill Service. The Skill Interface configuration defines how users verbal command are resolved to events which is then routed to its Skill service. This way with an Alexa Skill you will write a Skill service, configure a Skill interface and test and deploy Skill. Hope this article helps you in understanding what an Alexa Skill is and how it is architected and developed.

My Name is Ashish @ashish_fagna. I am a software developer . If you enjoyed this article, please recommend and share it! Thanks for your time.

You can also contact me on ashish [dot] fagna [at] gmail.com

--

--