Getting started with Voxa: Creating an Alexa skill Part 1

Wuelber Castillo
NicaSource
Published in
4 min readJul 2, 2018

Voice technology is on the rise, there’s no doubt about it. Big companies like Amazon, Google and Microsoft are betting on this new trend and I think voice will become more prominent on the next few years. Only in 2017 there were 24 million smart speakers sold globally. People are really getting into smart speakers and voice technology.

What does that mean for you as a developer?

Right now web development is the trend around developer conferences, tutorials and of course, job opportunities. Voice technology is opening new opportunities for developers go beyond a web app. Amazon and Google are encouraging more and more developers to be involved inside the voice environment around their respective smart speakers; improving the AI inside their assistants and opening new tools to make rich voice experiences. I’m sure there will be more conferences, tutorials and job opportunities around voice technology.

Amazon has Alexa, Google has Google Assistant, Microsoft has Cortana. Where should I start?

There’s a lot of comparison videos on YouTube between the three assistants. I don’t intent to promote one assistant over the other but Amazon has done a great a job taking the lead on this new tech war. Taking over 65% of market share for their Amazon Echo device and opening their platform to developers, improving the development experience, even giving rewards for expanding Alexa skills even more; I think Amazon has done a great job and it’s a great starting point if you want to start developing voice apps. There was a pound intended in the last sentence since the way to expand Alexa is by developing “Alexa Skills”.

What’s an Alexa skill?

A skill is sort of the equivalent to an app for Alexa. Just like Android and iOS has apps that you can download to your phone, Alexa has skills and that’s where us, the developers, come to play. An Alexa skill is not something that you download to your smart speaker, is something more like you activate, all is on the cloud.

How does an Alexa skill works?

The way Alexa communicate with a skill is via HTTP, sending post requests to the skill endpoint with JSON responses. Giving that principle you can develop an Alexa skill with any backend programming language where you can build a web server to communicate with. Despite of that, the most popular language to develop Alexa skills right now is Javascript (Node JS). And that’s the one language we are going to work on using the Voxa framework.

What’s Voxa?

Voxa is a framework developed by Rain Agency. Rain is one the agencies recommend by Amazon for Alexa skill development and they have created an easy to use framework to develop Alexa skills based on Node JS. The framework is on active development, always improving and adding support for Alexa latest features.

Before we start, some basic definitions

Voice development has some new terms for developers. Because most of us have background developing apps, either mobile, web or desktop, voice interface introduce new concepts but they are pretty easy to learn and understand.

Interaction Model

The interaction model is the voice interface that maps the user’s voice input into intents. For example, the interaction model for a skill can be something like this:


{
“name”: “HelloIntent”,
“samples”: [
“hi”,
“hello”,
]
},
{
“name”: “ExitIntent”,
“samples”: [
“bye”,
]
}

What the above means is that when the users says the phrase “hi” or “hello” to your Alexa skill, the interaction model will map that response to the HelloIntent. If the user says “bye” it will map the request to the ExitIntent. The phrases or samples in an interaction model are also called utterances.

Utterances

Given the variation of a language in the real world, there will often be many different ways to express the same request. Providing different phrases to your interaction model will help improve voice recognition for your skill, these phrases are called utterances and the interaction model will be in charge of mapping the utterances to a respective intent. But what’s an intent?

Intents

The intents are the way your skill knows what’s the user request. Following our Interaction Model example, we know that user greeted our skill when we have a HelloIntent, given that input we can say something back to our users like returning the greeting and saying hello back to them. Intents can optionally have arguments called slots.

Slots

Maybe you have used a voice assistant before but one of the things that an assistant is commonly used is to setup an alarm or add an event to your calendar. Taking the example of setting up an alarm, your input can have variables like the hour or the days that the alarm will go on. Those variables are called slots. If your skill consist in taking an user’s order for a pizza you can have intents where the user specifies the ingredients right?


{
“name”: “AddIngredientIntent”,
“samples”: [
“I want {ingredient} on my pizza”,
],
“slots”: [
{
“name”: “ingredient”,
“type”: “INGREDIENTS”
}
],
},

So as you can see, the ingredient is the slot in an utterance for a specific intent in your interaction model.

Now that you understand the basic concept let’s go and create our first Alexa skill in the Alexa Developer Portal.

--

--

NicaSource
NicaSource

Published in NicaSource

Nicasource is a specialized staffing firm offering clients high-quality dedicated teams & staff augmentations solutions. You will find us here sharing our best practices, tips & tricks on the technologies we love and some other trends.