Making an Amazon Echo Skill, Part 2: Development

This is a 4 part post on building an Amazon Echo skill. Part 1 is the what and why, part 2 is the development, part 3 looks at outreach & marketing, and part 4 wraps up with platform issues and opportunities.

Jake Simms
6 min readJul 6, 2017

In our first post, we laid out the the what and the why. This post is a description of #theprocess of making the skill and submitting it.

Getting Started

The Echo allows for streaming audio, but neither Nick or I are professional programmers. I know enough to be dangerous, but am limited at designing a system and writing. The good news is Amazon has an open source template for audio streaming skills.

The template readme has a true step-by-step. If you can navigate basic javascript, and you know the basics of the command line (like checking something into git), then you’ll be able to slog your way through the node.js template and get something up on an AWS Lambda in a handful of hours.

I won’t recreate that below. What I’ll do below is give more color around the process and the experience working with the Alexa Skills Kit.

Scoping

We really wanted to accomplish two things. First, have the Echo respond to “take me out to the ball game” by choosing a random public domain broadcast and playing it.

Second, we wanted to scope this to a weekend. Given our skill sets, that goal was reasonable if we kept within the bounds of the template. I’d handle the code. Nick would handle the creative.

System Overview

The first part of work was getting on the same page around how everything works.

All things considered, Amazon makes it easy for us.

Fortunately, the Amazon template pretty much solved all the difficult problems with audio playback. Leaving us with just three main parts to change in the template:

  • The Voice UI
  • Output Speech Copy
  • Game Data

Then you host and configure all of those together through the Amazon developer dashboards.

The Voice UI

Language can be a complex thing.

In general, the Amazon model for Voice UI is made of three parts: Invocation Phrase, Intents, and Utterances.

The invocation phrase is the unique name for your skill. When a user says this name, Alexa knows they want to use your skill. It’s unique to your skill and no one else can have it.

Intents are like the feature set of the skill. They are the actions it can do. Amazon has supplies functionality for common intents like ‘help’ or ‘cancel,’ and audio controls like pause, stop, shuffle, etc. You are responsible for your own custom intents like ‘play audio’ or ‘tell horoscope’.

Utterances are what you say to actually have the skill perform an intent. You don’t click a play button, you say ‘play audio.’

(Side note: I’m not going into custom slots because we didn’t need to for our skill. Custom slots help you request something more specific.)

We didn’t need to change the intent schema, we so we set our invocation phrase to ‘take me out to the ball game’ and came up with our utterances for the playAudio intent. Here are the different ways to say ‘play a game’ we came up with off the top of our head in 5 minutes.

…and on, and on, and on. You get the idea.

Output Speech Copy & Test Run

Next, was a scan of the template code for areas to change output speech copy. That’s pretty straightforward so I won’t go into much here.

I used this time to get a better understanding the skill architecture. Again, the template’s readme does a great job of this, so I won’t recreate here.

I also swapped out URLs in the sample data file with some of the MLB broadcasts and followed the template’s readme to setup everything up for testing.

Once the skill was hosted and configured, I tested “Echo, take me out to the ballgame.” Success. We were listening to the 1934 All Star Game.

Data Transform

The next big step was data entry. It took roughly 3 hours to transform about 150 games from the archive.org page into structured data for the skill.

A perk of not outsourcing this was it was easier to manually change each title to better match spoken English. We say things differently than we read and write them, and it seemed more natural in a voice UI to lead with the team name rather than a date.

Easier to listen to and understand than: “1971 09 30 San Francisco Giants vs Padres Giants Clinch NL West Marichal vs Roberts“

Failed Tweaks

With the data in, there were a couple other small tweaks I wanted to make before we published in order to make the experience what we really wanted it to be:

1. Automatically shuffle the games when selecting one to play
2. Tell the user what game is playing before playing it
3. Select another random game if the user says “no” to resuming where they left off when restarting a session

They seemed easy enough, but when I went in to test to changes. Nope. The skill zonked out.

After an hour or two of struggling to troubleshoot (more on that later), I rolled everything back. A copy change on the welcome message about turning on shuffle was “good enough” and kept us in the scope of keeping this a weekend project.

Approval

With everything working again, I hit submit. A half day later, we got the news! Our skill was rejected!

D’oh.

Remember the invocation phrase part of the voice UI? Turns out, “Take me out to the ballgame” wasn’t going to fly (see rule 5).

Total bummer. That phrase was a big part of the skill personality and had gotten good responses from friends and family in testing.

After a short text brainstorm, Nick and I settled on ‘background baseball.’ Literal, but quickly gets at the value proposition. I changed the phrase and some copy, resubmitted, and a little later we got the news that this one passed the test.

👍

While I was doing this, Nick was working in tandem on the icons, branding, domains, copy, and more. We’ll go into that in part 3.

Next Post: Part 3 — Outreach & Marketing

Background Baseball is an Amazon Echo skill that allows you to stream classic baseball games from the 1930s-70s.

If you have an Echo, you can enable it in Amazon’s App Store.

If you like it, do us a solid and do one or more of the following:
1. Leave a review on Amazon
2. Give us an upvote on
Product Hunt
3. Share it with friends

We aren’t making money off this, but having people using it and enjoying it make us feel good. Thanks for the help spreading the word.

--

--

Jake Simms

“You always figure the audience is at least as smart as you are.” — Lou Reed