Amazon’s Alexa meets DevOps (1/2)

Published in

The Andela Way

6 min readFeb 23, 2020

Amazon’s alexa as well as many other voice assistants are taking the new world by storm with automation of daily routine tasks. Given how fast the world rotates in today’s age, automation seems more appealing than ever.

Photo by Austin Distel on Unsplash with my own twist

As a DevOps Engineer, one of the kings of automation, I choose to take alexa for a spin into our world in this series of articles of how Amazon’s Alexa meets DevOps

In this 2 part series, am going to take you through developing an alexa skill using the Alexa Skills Kit, AWS boto3 and AWS S3. Which will then be hosted on AWS Lambda.

In this part am going to take you through the basics of building an alexa skill through the alexa interaction model (“frontend”) and then in the 2nd part take you through the “backend” with the alexa SDK and then bring it all together with AWS S3; Let’s begin.

A brief introduction to Alexa Development

When building for Alexa, we use the Alexa Skills Kit to create alexa skills with the help of the Alexa Voice Service interaction model which is supported by the backend that is built using the alexa sdk..

Just like any normal web application, an Alexa skill also has 3 tiers. It has a frontend (Alexa Voice Service Interaction Model), a backend (Alexa SDK in python or NodeJS) and storage (dynamoDB) which is optional, unless you want to be memorable to Alexa.

For this article, we are going to stick to the first 2 tiers for simplicity

While we build for Alexa in this article, we are going to look at 3 main parts. 1/ Alexa Voice Service, 2/ Alexa SDK and the platform/service we are going to interact/integrate with using Alexa (AWS S3)

To get start, you should have an account with AWS and Alexa Development Platform. If you don’t already have one, create an account for AWS here and Alexa Development here. With that out of the way, let’s start.

To create an Alexa skill, head over to this link where you should see a page that looks like this one.

Clicking on the “Create Skill” button will start the skill creation process. This will take you to this page where you will give your new skill a name, choose a default language, choose a model to add to your skill, one that suits the needs the skill will address (for this article I chose custom) and finally you choose the method to host your skill’s backend. see image below for reference.

For the hosting method, choose one that fits your tech stack, nodejs or python (alexa hosted backends) or choose “provision your own” if you are going to host the skill yourself somewhere other than the default alexa hosted platform.

With all that set, click the “Create Skill” button which will segue you to this page.

The developers in charge of the Alexa development project have put up awesome templates to help kickstart anyone creating an alexa skill with some ready made skills whose templates have useful behaviour that may be necessary for one creating a new skill

Go ahead and select a template whose features closely match your desired ones. You can know this through the tags on each template as highlighted on the screenshot above.

With that selected, go ahead and click the “create” button which will then go ahead and create your new skill with template code from the template you chose in the previous section. This code can be edited to your custom specification.

With that done, you will be redirected to your new skill dashboard console

To start/use your newly created skill, you have to mention a specific word to invoke it. This invocation phrase/name of your skill (the words you speak to the alexa device to start your skill) is normally defaulted to your skill’s name, but you can change it in the “Invocation” tab on your skill console under the Build tab as shown below.

With that out of the way, let’s build the frontend

Frontend

In the Alexa Voice Service interaction model we define and build the “frontend” for the alexa skill.

The interaction model is the interface through which a user’s interaction with the skill is digested and forwarded to the backend

alexa frontend builder (https://developer.amazon.com/alexa/console/ask/build/)

To define the interaction model, we map a user’s spoken input(phrase) to intents defined in the backend cloud-hosted alexa skill service.

Frontend building blocks:

Intent
Slot
Utterance

An intent represents an action that fulfils a user’s spoken request/phrase. These can optionally have arguments called slots.

A slot is a variable value that is supplied to the intent at runtime through an utterance spoken by the user
An utterance is a spoken phrase that a user may speak

Slots have types; something like a variable in python of type int where all values of that variable will be integers.

Backend

The alexa backend is built using the alexa skills sdk. There is a number of these in various languages like python, java and node js. For this one we shall use python.

When a user speaks an utterance, the Alexa Voice Service “digests” it, checks the “frontend” (interaction model) for an utterance that matches the said phrase and if there is a match it sends it to the intent under which that utterance was registered.

Utterances specify the words/phrases users can say to invoke a skill’s intents. These should include as many representative phrases as possible that you think your users will say to invoke that specific intent, all of which are mapped to their respective intents in the interaction model as shown in the image below.

Sample utterances for the CreateBucket intent

The Alexa Voice Service goes ahead and queries the backend for an intent handler with a similar name as that of the invoked intent which must also be “lebelled” as the handler for that intent. We shall see this in action soon when we get to the code.

An intent handler is the Alexa skill backend where a user request is sent to be digested

The backend (intent handler) then digests the request, constructs a response and hands it over to the Alexa Voice Service to speak to the user through an alexa device like an echo dot or echo show.

When creating a new alexa skill, you have to declare where the backend will be hosted, it can be hosted on any platform that can expose an HTTP(S) API endpoint.

Platforms such as kubernetes, EC2, GCE, aws lambda

For this artcile we are using an Alexa hosted backend/endpoint which is hosted on AWS lambda but deployed and maintained by the Alexa development platform so that all we focus on is the actual alexa skill and not the underlying infrastructure. You know the drill; serverless.

In part (2/2) of this article series, we shall go ahead and build out the backend of this skill and set all the required user permissions on the AWS IAM side of things and eventually deploy the completed skill and test it.

Just like the screenshots above may have hinted, yes, the skill is going to query AWS’ S3 to create, delete and list a registered AWS user’s S3 buckets.

Fill free to drop a comment or suggestion or question about anything in this article for improvement, addition or collaboration. LinkedIn/Twitter

With that said, follow me to 2/2 here to continue the journey into building for voice automation with Amazon’s Alexa in the DevOps realm.

Amazon’s Alexa meets DevOps (1/2)

Written by David Mukiibi