Some tipps on building the core of your Alexa skill — the Interaction Model

Daniel Kremerov
4 min readAug 25, 2017

--

This blog post describes how to set up the interaction mode for a skill with a clear focus on potential problems that I encountered along the way. The next two posts will help you with setting up the backend, consisting of business logic hosted on AWS Lambda and a persistence layer and configuring them with the created Interaction Model.

During my project, I developed a total of three fully working Alexa skills (and many test-skills along the way). As there is already a lot of advice out there, I want to focus on describing aspects that were problematic for me. Most of the things are not overall complex as Amazon does a good job to simplify developed. However, a big challenge is that the environment grows and evolves at a fast pace. Therefore many recommendations and third party frameworks get outdated quickly, which for me resulted in a lot of trial and error and even some periods of frustration.

Setting up a basic skill is relatively straight forward and is well documented. After selecting a skill name, it is important to choose an invocation name that is easy to pronounce as a complicated name can be a bottleneck when testing, as Alexa often has trouble to recognise it (although it gets better over time, thanks to “AI inside”). Another thing to watch out for is to select exactly the language that you want to use, so even differentiate between American and British English. Latest at the point when the skill is linked to a Lambda backend, selecting the wrong language will break the whole skill. Once I developed a whole Skill, connected it to the backend, deployed to my Echo Dot, and all Alexa was just saying “Sorry, I don’t know that skill” and the only reason was that my interaction model was build in American instead of British English.

Next, you are building the core of the skill — the interaction model. Terms like intents, utterances and slots are defined all over the internet, so we skip this part here. Rather it is important to emphasise that currently there are two ways to create an interaction model, either using a basic JSON file or using the new Skill Builder, which is a fancy UI that simplifies the process. I strongly recommend to use the Builder, but keep in mind that most of the documentation online is made for the basic JSON version. This is certainly a tradeoff. Next, to the supportive UI, the big advantage of the Builder is the Dialog model. It allows to handle different dialogue flows inside the Interaction model, whereas with the old version it was necessary to address these in the backend code.

For instance, if you have an intent that takes in a vital reading (body weight, blood glucose etc.) and the value of this reading, Alexa eventually needs to retrieve two slots, namely type and value. To give the maximum user flexbility, it is important that he can invoke this intent either by saying something like “Take my vitals” or “Take my body weight” or “My body weight is 90kg”. The last prompt, contains all slot values, whereas for the other two in needs to get more information by asking the user something like “Which vital do you want to take?” or “How much is your body weight today?”. Also, Alexa can ask for confirmation either for each slot value or for the total outcome like “So your body weight today is 170kg?”. So to sum up, dynamically asking for slot values and prompting for confirmation can both be handled inside the interaction model when using the new Skillbuilder.

Also, it is worth mentioning that it is important to provide as many sample utterances as possible for each intent, as when testing it becomes clear that this trains the Alexa AI and makes it comprehend the users of your skill much better. Rather than typing in all utterances by hand, I recommend to create them using Regular Expressions in combination with a third party script such as this one, as this will save you a lot of time when creating a sophisticated interaction model.

In the next post, I will describe some of the configuration options and explain things to watch out for when creating your backend function and linking it to your skill.

About the author:
I am an entrepreneurially-minded MSc. Computer Science student at University College London. Priorly I studied Business and worked in the consulting and StartUp-sphere. This summer, I have the unique opportunity to dive very deep into the topic of Personal Assistants in Telehealth, fully supported by UCL and NHS Digital UK. I want to give back, so I strive to provide unique insights to my readers, from a technical and non-technical perspective.

Also, check out my StartUp Permitly. ;)

--

--