How we Built our Alexa app at Kit

Things we learned developing for Amazon’s voice assistant

Will Chiong
Kit Blog
9 min readNov 16, 2016

--

It’s an exciting time to be developing skills for the Amazon Echo.

The current state of the Alexa Skill Catalog reminds me of the early days of the iOS App Store; the potential of the platform is still being explored by new developers. I wanted to share my team’s learnings developing our first skill — a quick and easy way to help you discover the best product recommendations on Kit.

Developing for the Alexa Skill Catalog

Amazon launched its voice controlled Echo device in November 2014 which is powered by the Alexa assistant. In June 2015 they announced the addition of a developer marketplace for “Custom Skills” — the Alexa equivalent of an app. If you own the newest Echo, or another Alexa-enabled device (such as the Dot or the Tap) you can download skills that expand Alexa’s functionality.

The Echo was Amazon’s top-selling electronics device last Black Friday, and Amazon is marketing it aggressively again this year. At just over a year old now, Alexa’s Skill Catalog has reached an equilibrium where the documentation and processes have become more stable than when they first launched, but the market is not yet oversaturated.

Skills are accessed with an “invocation name” which is globally unique. As the Alexa platform becomes more popular, securing a good invocation name for your brand or application may be like getting a good domain name or username on a new platform. We were happy to be able to get our brand name “Kit” as an invocation keyword, so you can invoke our skill using the phrase “Alexa, Ask Kit …”

Designing and implementing your Custom Skill

Have you ever questioned the nature of your Custom Skill?

Your Skill should be able to respond to prompts that are in the format “Alexa, Ask <Invocation Name> …” We allowed our skill to respond to questions about the recommendations made by creators on our platform:

  • Alexa, Ask Kit what’s the best gaming keyboard?
  • Alexa, Ask Kit for yoga mat recommendations
  • Alexa, Ask Kit for a good under eye concealer

An Alexa Custom Skill has three basic requirements: an Intent Schema, Sample Utterances, and a service endpoint.

An Intent Schema

For any type of interaction that would be enabled with your invocation keyword, you must define an Intent Schema in a JSON file. For each Intent in your schema you will also define “slots” which are parameters that can be sent to your service endpoint — these can be any of a number of predefined types that Amazon has made, or a custom slot type that you define as an enumeration of acceptable values.

Sample Utterances

You must give Alexa the formula for all types of questions that your Skill knows how to handle.

We have a number of different question formulations to which we want Alexa to respond, so we define all possible formats in our Sample Utterances file. For example, these two utterance declarations:

WhatProduct what’s the best {mouse|Category}
WhatProduct about the best {speakers|Category}

allow Alexa to be able to respond to both “Alexa, Ask Kit what’s the best mouse” and “Alexa, Ask Kit about the best mouse”. In the above example “WhatProduct” is the name of our Intent, and the bracketed category variable is a placeholder for the slot mentioned above, that in our case can represent any product inquiry.

Amazon’s web-based developer portal does not version either the intentSchema.json or the Sample Utterances, so I recommend backing them up on an external version control system in case you ever need to track or revert your changes.

A service endpoint (Lambda or web service)

Finally, you will link your Alexa Skill to a service endpoint. While they allow any web service, I highly recommend using the AWS Lambda service, which lets you run code without a dedicated server — you only pay for the actual compute time used in executing your code, so it is incredibly inexpensive to use while you are developing and testing the Skill. It also provides built-in integration with Alexa intents, which is very useful when testing the skill.

You can define a Lambda using either Node or Java, but I recommend using Node; developers have complained of very slow startup times initializing a Java Lambda — as much as five seconds. While this is not an issue for some use cases, for a voice interface five seconds might as well be an eternity.

You don’t need to handle any of the speech input or output processing — all of that is handled by the Alexa infrastructure. Your service endpoint will receive an event with the intent invoked, and the slots inferred by the user’s voice request. For the above example intent and utterance, a user’s voice request of “Alexa, Ask Kit what’s the best mouse” will be translated into the following JSON payload and sent to your service endpoint:

“request”: {
“type”: “IntentRequest”,
“requestId”: “<REQUESTID>”,
“locale”: “en-US”,
“timestamp”: “2016–11–14T18:24:36Z”,
“intent”: {
“name”: “WhatProduct”,
“slots”: {
“Category”: {
“name”: “Category”,
“value”: “mouse”
}
}
}
}

Your Lambda can then execute its own code using the data from this structure as input, as well as context information provided based on previous intents from this session. Your function should then respond with a success callback containing the string that will be spoken to the user. If you have any worries about how your text will be pronounced by Alexa, you can customize the output speech using SSML. You can also test it using the provided Voice Simulator in the Amazon developer console.

Things you CAN do with your Custom Skill:

  • Show an image in the related Alexa app. Even though the primary interface for your Alexa Skill is voice, users can see their history in the Alexa App on their phone, and you can use visual images there.
You can enrich the customer experience with your skill by including images that will be displayed in their linked phone app.
  • Retain the context and conversation history. You can enrich future responses in this session and make your Skill more useful. In our case, if someone makes a request for a search term that we don’t have an answer for, we then re-prompt them with similar search terms from our database.

Things you CANNOT do with your Custom Skill:

Custom Skills cannot yet pay the bills
  • Make purchases, or interact with your Amazon cart
  • Use external links to a website or app in the Alexa App information card

Testing your Skill

A good developer knows that testing at every stage is very important for avoiding unforeseen malfunctions.

Via AWS Lambda Management Console

Lambda’s web interface provides some built-in sample Alexa intents that you can edit to match the intents and slot values to mock up requests to your service.

Lambda’s interface is already optimized for integration testing with sample Alexa Intents.

The web-based editor is very barebones, so I prefer to edit and save my test requests in an external git repo, and copy and paste them into the editor as needed.

Via Service Simulator

If your Lambda responses look good, then you can go back to the Amazon developer console and test the service directly using phrases. Remember when you’re filling this in to omit the invocation keyword (you can leave out “Alexa, Ask Kit”).

Alexa’s Service Simulator lets you type in sample input for testing instead of relying on a voice interface. You can still preview the voice response from your computer.

On a physical device

If you own an Alexa-enabled device, you can load a development version of your Skill on to your device for more complete end-to-end testing. The device must be registered to the same account as your Amazon Developer console. Once you enable the development version of the Skill on your developer console, the Skill will automatically show up in the “Your Skills” section of the Alexa app on your phone.

Gotchas

There are a few small holes in the development process.

I did notice a few issues I wanted to highlight for future developers:

  1. The Amazon Developer portal, where you will configure and submit your Custom Skill, is not tied to other Amazon AWS services. You won’t use an IAM login, and it is not linked to your AWS organization account. You will spend a lot of time flipping between the Amazon developer console and your Lambda dashboard in AWS.
  2. Additionally, this means the Alexa Skill doesn’t use an AWS VPC for security, and you may have to rely on other methods to secure your service endpoint.
  3. You can’t use the AWS CLI or other command line tools. You will have to copy and paste the Intent Schema and Sample Utterances into a textarea in your browser, which does introduce the possibility for error and makes it difficult to diff new changes.
  4. Testing on a physical device requires that the Amazon developer account matches the email of the account registered to the device, which is likely your personal email, not your enterprise account. This seems to be the only way to “sideload” skills onto a physical device.
  5. Lastly, and most importantly: every time you say “Alexa” when talking to your co-workers she wakes up and chimes in.
Alexa will wake up and chime in on just about every conversation you have during the development and testing process. Get used to it.

Navigating the Skill certification process

The actual skill certification process also took much longer than expected. It took about six weeks from our initial app submission to when it was actually certified and available in the skill catalog.

The Skill certification submission form does not allow for very much detail, and even when it is provided, it is unclear if the information is processed correctly by the testers. As an example from our own certification process, Amazon’s Invocation Name guidelines indicate that you need to provide proof of trademarks used in invocation keywords, but the Skill certification submission form does not actually provide any method to upload or link to this kind of documentation. We ended up putting this information in the “Testing Instructions” field, but our initial submission was still rejected for not providing our trademark information.

What ultimately worked for us was opening a ticket via the “Contact Us” form. We had to exchange the relevant documents back and forth through email before we finally did get certified.

In summary

  • It’s a great time to develop a Custom Skill. The catalog is still small enough that you could get discovered by new users, and you can reserve your brand as an invocation name.
  • Tight integration with AWS Lambdas make the actual development process very fast, inexpensive, and easy to test.
  • Lack of integration with pretty much all other parts of AWS make login, code versioning, and permissioning surprisingly frustrating.
  • The final certification can be a bit of a hurdle and may take more time than you expect!

Try it out!

If you have an Amazon Alexa device, you can enable our Custom Skill to “Ask Kit” for recommendations on products from experts in our network on kit.com.

And if you like building new platforms and experimenting with new interfaces, we’re hiring! Please check out kit.com/jobs for open positions!

--

--

Will Chiong
Kit Blog

I love coffee, beer, Scala, long walks on the beach, and oxford commas.