In our first post, we laid out the the what and the why. This post is a description of #theprocess of making the skill and submitting it.
Getting Started
The Echo allows for streaming audio, but neither Nick or I are professional programmers. I know enough to be dangerous, but am limited at designing a system and writing. The good news is Amazon has an open source template for audio streaming skills.
The template readme has a true step-by-step. If you can navigate basic javascript, and you know the basics of the command line (like checking something into git), then you’ll be able to slog your way through the node.js template and get something up on an AWS Lambda in a handful of hours.
I won’t recreate that below. What I’ll do below is give more color around the process and the experience working with the Alexa Skills Kit.
Scoping
We really wanted to accomplish two things. First, have the Echo respond to “take me out to the ball game” by choosing a random public domain broadcast and playing it.
Second, we wanted to scope this to a weekend. Given our skill sets, that goal was reasonable if we kept within the bounds of the template. I’d handle the code. Nick would handle the creative.
System Overview
The first part of work was getting on the same page around how everything works.
Fortunately, the Amazon template pretty much solved all the difficult problems with audio playback. Leaving us with just three main parts to change in the template:
- The Voice UI
- Output Speech Copy
- Game Data
Then you host and configure all of those together through the Amazon developer dashboards.
The Voice UI
In general, the Amazon model for Voice UI is made of three parts: Invocation Phrase, Intents, and Utterances.
The invocation phrase is the unique name for your skill. When a user says this name, Alexa knows they want to use your skill. It’s unique to your skill and no one else can have it.
Intents are like the feature set of the skill. They are the actions it can do. Amazon has supplies functionality for common intents like ‘help’ or ‘cancel,’ and audio controls like pause, stop, shuffle, etc. You are responsible for your own custom intents like ‘play audio’ or ‘tell horoscope’.
Utterances are what you say to actually have the skill perform an intent. You don’t click a play button, you say ‘play audio.’
(Side note: I’m not going into custom slots because we didn’t need to for our skill. Custom slots help you request something more specific.)
We didn’t need to change the intent schema, we so we set our invocation phrase to ‘take me out to the ball game’ and came up with our utterances for the playAudio intent. Here are the different ways to say ‘play a game’ we came up with off the top of our head in 5 minutes.
…and on, and on, and on. You get the idea.
Output Speech Copy & Test Run
Next, was a scan of the template code for areas to change output speech copy. That’s pretty straightforward so I won’t go into much here.
I used this time to get a better understanding the skill architecture. Again, the template’s readme does a great job of this, so I won’t recreate here.
I also swapped out URLs in the sample data file with some of the MLB broadcasts and followed the template’s readme to setup everything up for testing.
Once the skill was hosted and configured, I tested “Echo, take me out to the ballgame.” Success. We were listening to the 1934 All Star Game.
Data Transform
The next big step was data entry. It took roughly 3 hours to transform about 150 games from the archive.org page into structured data for the skill.
A perk of not outsourcing this was it was easier to manually change each title to better match spoken English. We say things differently than we read and write them, and it seemed more natural in a voice UI to lead with the team name rather than a date.
Failed Tweaks
With the data in, there were a couple other small tweaks I wanted to make before we published in order to make the experience what we really wanted it to be:
1. Automatically shuffle the games when selecting one to play
2. Tell the user what game is playing before playing it
3. Select another random game if the user says “no” to resuming where they left off when restarting a session
They seemed easy enough, but when I went in to test to changes. Nope. The skill zonked out.
After an hour or two of struggling to troubleshoot (more on that later), I rolled everything back. A copy change on the welcome message about turning on shuffle was “good enough” and kept us in the scope of keeping this a weekend project.
Approval
With everything working again, I hit submit. A half day later, we got the news! Our skill was rejected!
Remember the invocation phrase part of the voice UI? Turns out, “Take me out to the ballgame” wasn’t going to fly (see rule 5).
Total bummer. That phrase was a big part of the skill personality and had gotten good responses from friends and family in testing.
After a short text brainstorm, Nick and I settled on ‘background baseball.’ Literal, but quickly gets at the value proposition. I changed the phrase and some copy, resubmitted, and a little later we got the news that this one passed the test.
While I was doing this, Nick was working in tandem on the icons, branding, domains, copy, and more. We’ll go into that in part 3.
Background Baseball is an Amazon Echo skill that allows you to stream classic baseball games from the 1930s-70s.
If you have an Echo, you can enable it in Amazon’s App Store.
If you like it, do us a solid and do one or more of the following:
1. Leave a review on Amazon
2. Give us an upvote on Product Hunt
3. Share it with friends
We aren’t making money off this, but having people using it and enjoying it make us feel good. Thanks for the help spreading the word.