Building an Alexa Skill as a Web Service on Heroku

I love playing around with new technologies; I am truly a tinkerer at heart. As a consultant and custom software developer, I frequently have to answer questions like, “What does this latest iOS update mean for our app? Can we leverage any new features?” or, “What benefits can we gain from this new Bluetooth standard?” Keeping up with the latest languages, platforms, IoT gadgets, etc., can be a daunting task, but it can also be really fun if you take it in small bites.

One of the most important skills that we professionals can have, regardless of our profession, is the ability to teach ourselves new things. Honing that skill is never a waste of time.

I’ve been hearing a lot of buzz about Amazon’s Alexa, and I decided it was time to give it a try. I asked my wife for an Echo Dot for Christmas. She graciously complied, and within a couple hours of unboxing it, my mind started to churn through possible Alexa Skills that I could write. I started to think about some old web applications that I had written in the past and wondered how difficult it would be to make an Alexa Skill that could interact with one of them. I set out on my mission: to make an Alexa Skill that is implemented as a web service hosted on Heroku.

Since this was just an experiment, I didn’t want to mess with an existing application just yet. I decided to build something new, completely from scratch. My goal was to make a Skill that would read me the latest blog post published on Atomic Spin! It turned out to be much simpler than I thought.

Starting The Alexa Skill

The first step in making a custom Alexa Skill is creating an Amazon developer account. Once you’re signed in, click the Alexa tab at the top of the Developer Console, choose ‘Get Started’ under the Alexa Skills Kit, and click ‘Add New Skill.’ On the left side of the screen, you will see a checklist which shows you all of the configuration steps you need to publish your Skill. I am not going to cover every option here since Amazon’s documentation is quite good, but I will point out some of the interesting pieces.

Configuring your Skill

Skill information

One of the first decisions you’ll need to make is what to use for your Skill’s ‘invocation name.’ This is the keyword that people will have to say to Alexa to signal that they want to interact with your Skill. For example, I choose ‘Atomic Spin’ for my Skill’s invocation name so that people can say phrases like, “Alexa, ask Atomic Spin to read the latest post.” Every directive (a.k.a utterance) that users give to your Skill must be preceded by your invocation name.

Interaction model

Intent schema

The intent schema is where you define the structure of the capabilities that your Skill will provide. In my case, the structure is really simple, but if your Skill will require users to specify options or choices, the schema will get more complex. My intent schema just defines a single intent for getting the latest post.

See the code in the full post.

Sample utterances

Sample utterances are the actual phrases that people can speak to interact with your Skill. Since experimental discovery is a big part of using Alexa, people will be phrasing requests in many different ways. In order to maximize usability, you’ll want to document as many variations of utterances as you can think of. For my Skill, I added entries like:

See the code in the full post.

Configuration

Service endpoint type

AWS Lambda services are Amazon’s preferred way of hosting Alexa Skills, but since we’re not using a web service, we’ll need to select the ‘HTTPS’ option. Amazon requires your service to support HTTPS and respond on the standard HTTPS port 443.

Once you choose your service’s location (North America or Europe), you must enter the full URL for the endpoint where you want requests to be sent. I’ll get into setting up the web service a little later, but for now, I set my Skill’s endpoint to https://alexa-atomic-spin.herokuapp.com/latest-post.

In hindsight, I really should have used something more generic than latest-post for the route because this same route will be used for all requests made to my Skill. If I later add another feature, like the ability to search for posts by title, those requests will go through this same endpoint.

SSL certificate

All web services that support HTTPS must have a valid SSL certificate. The source of your certificate will vary depending on where you’re actually hosting your service and the server’s setup, but for services like Heroku, they provide a wildcard certificate that you can use. If you wish to use the wildcard certificate, just choose the option “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority.”

Test

That’s basically all the configuration you need to set up a simple Alexa Skill. The last thing you need to do before publishing is to test it out. The Test tab allows you to submit example utterances which will be sent to your actual web service. Pretty cool, eh? Of course, we haven’t made the web service yet, so let’s go to that now.

Web Service Setup

Ruby + Sinatra

I love writing apps in Ruby, but of course you can use any language that is supported by Heroku. Sinatra is a really nice, light-weight web framework for Ruby that you can use to build web applications with very little code. I chose to use Sinatra for my Atomic Spin Alexa Skill. You can access the full source of this project at Github — alexa-atomic-spin.

All requests made to my service by Alexa will come through the same route: a POST request to ‘/lastest-post.’ I defined a route in my application that looks like this:

See the code in the full post.

Certificate verification

The first really important thing that happens here is the verification of the request signature. All requests that Alexa makes to your Skill will be signed with a valid signing certificate. In order to verify that the request is legitimately from Alexa, and not from an malicious attack, you need to verify the certificate URL and the signature. I found a Ruby gem, alexa_verifier, that does just that, so I use it to perform the verification. If you are writing your service in Java, you can use a function that Amazon provides in the Alexa Skill kit do to this verification. If you do not verify the certificate, Amazon will not accept your Skill submission.

Constructing a post object

Once the certificate has been verified, I can actually do the work to get the latest Spin post. I made a class called “Spin” that encapsulates all the work of making the request to the WordPress API. It gets the content of the latest post and constructs a post object that includes the title, author, and an array of strings which contain sections of body text. You can see the full source code on GitHub.

Alexa responses

Amazon’s Alexa service expects the response our Skill returns to be a JSON object with a specific structure. There are a lot of options for what your response can contain. One option allows you to specify a response formatted with SSML, which is Amazon’s Speech Synthesis Markup Language. SSML allows you to specify things like phonetic pronunciation of words, spelling out words, pauses of various lengths, etc.

I made a function called post_to_ssml that takes the blog post information, formats it with proper SSML tags, and inserts breaks between the paragraphs so that it sounds more natural when it’s spoken to the user.

See the code in the full post.

Finally, another function, make_ssml_response, constructs the actual JSON object containing the SSML text. See the Alexa Skill documentation for more information on constructing responses.

See the code in the full post.

That’s it! Our web service is complete. All I had to do was create a free project on Heroku, push my application to it, and boom–I have a fully functioning Alexa Skill defined as a web service hosted on Heroku. I can test it in the Amazon Developer portal, and I can see that my Skill returns a valid response. I can even test it on my Echo Dot that is signed in with the same Amazon account I used to create my Skill!

Submission…Rejected

At the time of writing this, my Skill hasn’t actually been accepted by the Amazon review process. Because my Skill contains the Atomic Object logo, which is a registered trademark owned by “Atomic Object LLC,” Amazon wants proof that I have permission to use said trademark. I am currently working on that. I will update this post with my results.


Originally published at spin.atomicobject.com on January 13, 2017.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.