A Voice Activated Future
Last year I purchased an Echo from Amazon as a music player as a family gift for Christmas. It also served as a new platform for me to develop “skills” that can interact using voice versus a traditional keyboard. The ability to play music was an instant hit, and I quickly dove into app development starting with simple trivia skills, leveraging the Node.js templates and being motivated by some sweet gear.
Fast forward to the summer where I ventured out further from the basics. Here’s an overview of a project from this summer that integrates a Raspberry Pi and homemade pitching machine that is controlled by the Robot Roxie skill. Full details are on the Hackster website here.
Background to Getting Started
There’s two parts to the Amazon Alexa platform. The first is the Alexa Skills Kit (ASK) on the main Amazon Developer site. This is where the configuration and publishing of the skill is established, including the voice/text translation instructions that ultimately get pushed down to the Echo or compatible device. The second part is writing the API calls that get invoked by the ASK. Starting with the trivia skills, I’ve been using AWS Lambda written in Node.js.
It’s been helpful to first design how the voice interaction should be between user and device. This helps in building the “utterance” file that is required for the skill in the developer console, as well as determine what functions need to be written in Node.js. Here’s a diagram that I used in the Robot Roxie skill, and helped design the interaction that drives the gameplay.
- The Amazon certification process for approving skills is similar to what Apple has for apps. I’ve now published five different skills, and the turnaround time is normally around 2–3 days. Most of the feedback that I’ve worked through is around making sure the voice dialog doesn’t have any exceptions in non-happy paths as the testing process is quite thorough.
- The AWS Lambda service is stateless, so if you’re going to do a more complex skill, its important to choose a way to persist session details. This can be done in the json object that is passed back & forth in the dialog, then read. I’ve seen other complex skills doing this in persisted storage like DynamoDB. Either way works, but is a key consideration when going above a basic single response skill.
- Using object storage like AWS S3 is a good way to abstract verbose dialog from the code in the skill. For example, the Hurricane Center skill that I’ve written stores historical data about storms for the past twenty-five years in json objects in a S3 bucket. That keeps the function focused on rules, and allows for growth of content after the skill gets launched.
- Metrics and logs are great ways to gain user feedback and should be monitored in the weeks after a skill gets launched. After launching Beer Bot, I was able to get some great insight into how users were using the skill that I hadn’t anticipated when designing, then adjusted with a subsequent release after monitoring the logs.
- The skills store isn’t well monitored, and feedback gets provided that might not be relevant to your skill. Very few people take the time to provide a rating, and a decent number of comments I’ve received are more reflective of how they’re connecting their Alexa device to the network, or basic capabilities of the device. Unfortunately very few people ever give a star rating, as I’ve written skills that get used by more than 100 people per day, yet no ratings (positive or negative).
The growth of the features available on the platform is exploding right now, and the press around the device continues to be positive. Back in the Spring, the milestone of 500 skills was surpassed, but recently Amazon announced that this has now grown to more than 3000. Given that this might be a hot item again this holiday season, and it’s now available in the UK and Germany, I’d expect that it might have 10x growth in 2016!