Conversational Integration Tests for your Alexa Skills (Node/JS)

EG Tech
Expedia Group Technology
5 min readFeb 14, 2017

By Joan Gamell Farre

Expedia released the first version of our Alexa Skill last December at the AWS re:Invent event.

Since then, a team of developers in the San Francisco and Bellevue offices have been hard at work adding features, refactoring and modernizing the code to adopt the new Amazon JS SDK. The main challenge during the refactoring was: How can we make sure we are not breaking the already existing functionality? I.e. how do we avoid introducing regressions in our code?

The answer, of course, was We need some tests (yes, we did the bad thing where your spike morphed into production code so we didn’t end up with very good coverage…). For unit tests we knew what we had to do. We settled for the standard Mocha with assertion and stubbing libraries solution, given that we are developing in Node and most of our developers use IntelliJ (which supports Mocha out of the box).

The problem was the Integration Tests (also called Functional Tests). Manual testing was not practical as a full pass of our test plan took anywhere between one and two hours of someone’s time when done manually. Since we’re talking the whole time to do it as well, that meant a lot of hoarse voices. Thus, we had to search for libraries to streamline our functional testing.

After some research, we found BST bespoken tools — a set of tools to develop, test and deploy Alexa Skills — to be the closest to a golden standard at the moment. We gave it a try, but found a couple of drawbacks for our use cases. First, it is too heavy for our taste as we would probably not use many of its features. Second, it’s not easy to follow BDD practices out of the box, which we wanted for our tests since it’s such a natural fit for the conversational model.

Given that we couldn’t find a functional test library to fit our needs, we decided to create our own: alexa-conversation (you can find the code in our Github repo)

Design and architecture

We had the following goals in mind when writing the alexa-conversation library:

  • Follow a conversational model of question-answer, like you would if you were testing the skill manually
  • Support BDD practices as much as possible to make the tests accessible to non-technical stakeholders, and people working on the voice user interface (VUI)
  • Avoid reinventing the wheel
  • Make it easy to integrate with any CI/CD pipeline
  • Use an easy, self-explanatory syntax

The result was a lightweight library which uses Mocha as a test runner. This library allows Alexa skill developers to write conversation-like functional tests by specifying ‘intents’ (and slots) as inputs and executing assertions against their skill’s response. All this happens without having to start any server or proxy and being able to run the tests in any node environment.

Alexa functional test architecture diagram

Using the library

First, install it:

[shell light=”true”]$> npm install — save-dev alexa-conversation [/shell]

Also install Mocha if you haven’t yet:

[shell light=”true”]$> npm install — save-dev mocha[/shell]

Here is an example of how easily you can define integration tests with this framework for your Alexa Skill:

[code language=”javascript”]
const conversation = require(‘alexa-conversation’);
// your Alexa skill main file. app.handle needs to exist
const app = require(‘../../index.js’);

// those will be used to generate the requests to your skill
const opts = {
name: ‘Test Conversation’,
app: app,
appId: ‘your-app-id’
};
// Other optional parameters are available. See readme.md

// initialize the conversation
conversation(opts)
.userSays(‘LaunchIntent’) // trigger the first Intent
.plainResponse // this gives you access to the non-ssml response
// asserts that response and reprompt are equal to the given text
.shouldEqual(‘Welcome back’, ‘This is the reprompt’)
// assert not Equals
.shouldNotEqual(‘Wrong answer’, ‘Wrong reprompt’)
// assert that repsonse contains the text
.shouldContain(‘Welcome’)
// assert that the response matches the given Regular Expression
.shouldMatch(/Welcome(.*)back/)
// fuzzy match, not recommended for production use. See readme.md for more details
.shouldApproximate(‘This is an approximate match’)
.userSays(‘IntentWhichRequiresSlots’, { slotOne: ‘slotValue’ } ) // next interaction, this time with a slot.
.ssmlResponse // access the SSML response
.shouldMatch(/<say>(Hello|Bye)</say>/)
.shouldNotMatch(/<say>Wrong answer</say>/)
.end(); // this will actually run the conversation defined above
[/code]

To use this library you need to have Mocha installed, either globally or locally. If you have a global installation of Mocha you can just run tests with:

[shell light=”true”]$> mocha test-conversation.js[/shell]

Or, if you prefer a local installation of Mocha, you can use npm’s package.json file to define a script that will execute all your functional tests under a certain folder (./funtests in our case). Npm will make the local version of mocha available in the path so you can just add this to your package.json:

[javascript light=”true”]
“scripts”:{
“funtests”: “mocha — recursive ./funtests”
}
[/javascript]

And run it like this:

[shell light=”true”]$> npm run funtests[/shell]

Once the execution finishes, the process will exit with 0 status code if all the tests were run correctly or with >0 if there were any errors, following the UNIX standard. This makes it easily pluggable in any already-existing pipeline.

Drawbacks

I want to note that even when using this library, manual testing is still highly recommended (if not necessary) to guarantee the quality of your skill, as the only way to test how Alexa is matching the user’s input (voice or text) with your intents is through the Amazon Developer console or testing on a real device.

Another drawback you might face is the fact that depending on how you build your outputSpeech you might have a hard time making sense out of the spaces between words. To solve that we introduced fixSpaces as an instantiation option to the conversation object, but be advised that it's far from perfect.

Finally, if your output contains variable phrases (such as dates or time), the testing framework might produce false negatives with a hard coded variable. We produced a work-around by allowing you to compare the output using Regular Expressions to define any necessary wildcards on these variables.

Feedback and contributions

This is a very early version of the library. There is a huge room for improvement so we would love to get some feedback and contributions from other Alexa Skill developers.

Please head to the issues section of the repo if you want to leave some ideas or comments.

--

--