Amazon Alexa: 5 tips to hear before starting development (in NodeJS)

Artur Skowroński
Smart Up
Published in
8 min readJan 15, 2017
Original from: https://flic.kr/p/EqwyFh, Licence https://creativecommons.org/publicdomain/mark/1.0/

In the world where nearly all most popular mobile apps are created by just two vendors and smartwatches never truly flourished, AI assistants seem to be a virgin territory, a promised land that has a chance to become “a next big thing.” Amazon Alexa, especially after late CES, appears to be a leader in that category and it’s quite obvious why. Amazon has a solid reputation in developers’ community, and it’s considered a trusted partner who knows how to make programmers’ life easier - Alexa itself was published with an extensive amount of examples and samples.

However, just beyond “hello world” stage, you will see there are a few things you need to know to make yourself more productive, and you will probably learn that the hard way. I would like to help you stay ahead of problems and provide you a proper head start.

An Important thing: I chose Node.js as my primary AWS Lambda language. I also assume that you have a basic-basic knowledge about Alexa and know the terminology like utterances and slots. If not, before you start reading, please go through the essential Alexa tutorial: Build a Trivia Skill in under an Hour.

1. Deployment Process

The significant pain when working with Alexa is setting up a solution locally. Although there are Lambda simulators that you can run on your machine, it is problematic to use them due to custom way Alexa triggers functions. If you are more experienced Alexa dev, alexa-app and alexa-app-server projects seem to be an efficient All-In-One option, including both development framework and the local testing environment. However, I wouldn’t suggest using them for first projects, cause those tools are highly opinionated and hide lower-level Alexa SDK abstraction too deeply underneath.

Local development is also problematic due to the way users interact with smart assistants — they talk with them using natural language. Alexa is picky, and you need to check how your chosen utterances will behave during field testing. That’s why you shouldn’t omit deployment to a real AWS Lambda instance — there is no, or no easy, option to attach local server to an actual Echo device. Lambda web editor is good for basic MVPs, but the necessity of uploading zips with all external node modules is far from convenient. Amazon provides their specialised toolset, but it’s overcomplicated in my opinion. The tool I consider the most suitable for that case is node-lambda. Using it, you can easily create and update new functions from the local shell running simple commands like:

node-lambda create 
node-lambda deploy

Configuration is also trivial — it is done by property files kept with project source code.

Bonus

If you want to use node-lambda, you need to create a Custom Group on AWS IAM:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1464440182000",
"Effect": "Allow",
"Action": [
"lambda:InvokeAsync",
"lambda:InvokeFunction",
"lambda:CreateFunction",
"lambda:UpdateFunctionCode",
"lambda:UpdateFunctionConfiguration"
],
"Resource": [
"*"
]
}
]
}

2. Gathering Logs

Ok, when we have our application deployed on Lambda, now we can invoke our function through Alexa Triggers. If everything is ok, Alexa will answer us with her synthesised voice. Unfortunately, in case of an error, we will just receive a gloomy

There was a problem with the requested skills response

I cannot say how many times I heard that during development process. After listening to that irritating message, our only option is to search the logs to check what happened.
To accomplish that, we need to use AWS CloudWatch — the service where AWS Lambda stores its logs. However, they are accessible through web interface less usable than desired. It was really annoying to click through Web UI each time I wanted to check what had happened, and like I said, “something” happened a lot. As a developer, I appreciate the possibility to tail logs from CloudWatch. cwtail does exactly that. We have to define which lambda function to monitor, and new logs will be delivered from consequent function executions. It’s worth mentioning that cwtail is a tiny wrapper over AWS CLI tool and needs it installed to work properly. It’s especially handy if we need to work with many different functions.

Bonus

Logs’ streaming can have a small delay. CloudWatch is not populated in real time; you will need to wait few seconds probably to see the current invocation informations.

3. Using Plurals

I used to not realise the importance of user experience in the past. I intuitively knew that God is in the details, but small mistakes like

You have one messages

were completely invisible for me. I unconsciously taught myself to not notice them, and I was happy in that state. However, it truly bit me when I started to work on an interface which is not presented to a user by text, but by voice. When you hear “You have one messages,” something is dying inside you - the illusion that you are talking with a real person simply fades off. In my experience, we are far more sensitive to errors in speech than in written text and even small errors can tear the whole experience apart. UX of Voice Assistants is altogether a fascinating topic, worth covering in future.

My first approach to solving that problem was to provide two different version for a given string:

but what worked in a basic example quickly became exhausting and impractical when the application grew. That’s why I decided it’s time to help myself with some tool.

In the deep and profound ocean of npm, I found something that can assist us with that problem. pluralize is a simple but efficient solution for the English language. It will change our previous example, with the power of template strings, into:

or even more consistent version:

Although it’s a simple thing, it‘s really helpful and worth remembering about.

Bonus

Use template strings whenever you can. While creating Alexa Skills, you will operate on text a lot and Node 4.3 running on Lambda has full support for this EcmaScript 2015 feature.

4. String Similarity Matching

It’s always worth reading the documentation before you start “happy coding.” It’ll stop you from making hard to find bugs or, which is far more dangerous, wrong assumptions about how to code. A truism, but often overlooked by many people, including myself.

When you start to work with Alexa SDK, sooner or later you will start using slots. They are a handy way to suggest Alexa how she should understand your commands. Suggesting is the keyword here. Despite documentation explicitly saying that they shouldn’t be treated like enums, I did exactly that. It worked for simple cases because Alexa has sophisticated algorithms that “weight” solutions toward slots. Unfortunately, for more tricky examples I was receiving a lot of stuff that was unknown for my application, even while being really similar. I didn’t expect such cases, which made me dropping a lot of requests where my user slightly missed some commands or Alexa understood the pronunciation poorly.

It’s not a bug, but a feature: Alexa doesn’t allow you to add slots programmatically, just by UI (which is really excruciating approach). However, Alexa will try her best to understand you. If you have arbitrary, user defined data you don’t have control over, like a list of their friends or favourite songs, you can still rely on Alexa. You just need to be prepared that she will do her best, not the exact match.

With that knowledge, instead of making simple string equality checks for your slots, it’s always worth introducing string similarity algorithms to your code. string-similarity is a comprehensive library for that. Its killer feature is a fact that it has convenient methods for cases when you ALWAYS expect the user to say one go the options. The easiest possible and least error prone, due to inherent way Alex is treating numbers, a solution is to use indices:

Alexa: 
You have three nephews:
1. Huey
2. Duey
3. Luey
What nephew do you want to hear about?
You:
I want to hear about nephew one

but reading all the possible options before a user can check anything is a horrible way of interacting with them. The far more natural solution would be:

Alexa: 
What nephew do you want to hear about?
You:
I want to hear about Duey.

Now, you just need to try to find the most suitable option:

string-similarity not only will return a matrix of nephews with similarities but also conveniently suggest the best match in their opinion:

This simple function will cover solutions to a lot of problems you will face with your Lambda. It’s probably not a state-of-art academic solution but is sufficient in majority of cases

Bonus

let only seems to work in strict mode on AWS Lambda. It managed to bit me so be careful about that.

5. SSML

If you want a good user experience in your Alexa Skill, you need to learn SSML. Period. SSML is prehistoric (by prehistoric I mean 2004) standard from W3C which describes text representation of speech. However even for simple skills, having the possibility of granular control how your response will sound is an invaluable option (it’s worth mentioning it was not accessible in the first version of SDK and even today is it’s supported only in a restricted subset of SSML). While Alexa is quite good at using accent nuances like questions or exclamations, a lot of commands will be overly condensed without supportive breaks. Please remember that your end user needs to process what Alexa says to him and decide what to do with it simultaneously.

When I wrote about plurals, I pinpointed that you will work a lot with text while developing for Alexa. Adding SSML Tags to your string responses make them less readable and harder to modify:

That’s why it’s so important to find some nice DSL to help you with that task. No library covers all SSML tags supported by Alexa, but most of the cases alexa-speech is ideal.

When you use it, the previous example will look like:

What’s more, alexa-speech helps to compose your partial strings in a single sentence. That is especially important while working with longer chains, but in my opinion, should be used in all cases as a sensible default.

Bonus

Remember that you need to inform Alexa’s engine about the fact that you use SSML

{
type: "SSML",
ssml: speech.render();
}

Otherwise, Alexa will read all your tags as-are. Try it once to realise that it’s not what you want your user to hear.

Bonus on partway: Echo Simulator

After all this talking about libraries, services and development practices, we cannot forget that we program hardware. We need to carry this physical thing whenever we want to introduce some new feature, especially if we want to be able to test it with a real voice (and I hope you understand the importance of that practice after this lecture). If you prefer, like myself, freedom of just opening your laptop and starting to code whenever you have free time, you should find Alexa simulator

I suggest trying great one — Echosim. It can authorise itself to your Amazon account (receiving access to all your skills under construction) and works in a browser, so you don’t need any additional software. I hope you will find it a handy tool.

That’s all for now. I hope that thanks to this article you will have a kinder start to the world of Voice Assistants — at least I know I would like to read this article before starting my experiments.

Many things still bug me (especially working and testing on local environment) and I need to learn what a good UX means in the context of Alexa. That’s why you can surely expect more articles covering that topic in the future, so stay tuned! Meanwhile, have fun with this new fascinating category of consumer technology. The crazier ideas will be implemented, and the more boundaries will be extended — that’s crucial for a device which is still in its childhood stage.

--

--