Lessons Learned from Shipping an Alexa Skill in ~60 Days

Published in

Expedia Group Technology

6 min readDec 1, 2016

On November 30th Expedia launched our first skill for Alexa, representing another foray for us into world of voice interfaces. In this posting we’ll share a few thoughts about how we put this skill together and some tips and tricks about working with voice user interfaces.

Where We Started

Around the middle of August serious conversations began regarding an opportunity to build an Alexa skill. If we were going to do it, we wanted to have it ready by Christmas. Because the deadline was tight and voice user interface is a new space, we knew we’d need to form a team that would gel very quickly, collaborate well, and iterate rapidly through a lot of ideas. The first thing after assembling that team in San Francisco was to explore the APIs and techniques. We met every day, demoed to each other frequently, shared our learnings, and parallelized building several different options. Each team member ran with a different feature that appealed to them. We built quick and dirty implementations of flight searching, hotel searching, and a content rich “suggest a destination feature” in just about a week and a half. We took a step back, tried them out with more real users, and…

…they were pretty terrible. But we expected that. We demoed these ideas and learnings to stakeholders and took their feedback (as well as user feedback, read below) to decide on the final feature set for our launch. Even though the deadline was tight, this time spent in exploration was absolutely critical.

Expect the Unexpected

The most interesting thing we learned is that users have high expectations for how to interact with conversational voice assistants. Unlike WIMP and touchscreen interfaces which use metaphors and visuals to show the user what’s possible, a voice interface provides very little prompting. There’s no menu of options to choose from and we’re constantly training users to interact naturally with the assistant. This means you can absolutely expect your skill to receive a bevy of responses you hadn’t planned on in ways that just aren’t possible in visual interfaces. Interface designers and developers are used to guiding and shaping what’s possible for the user to avoid dead ends, but most of those tools don’t apply to voice.

How your skill responds to unexpected input becomes incredibly important. How do you interpret something like “actually instead, I just want to know when my flight is” when a user is almost done with a booking? While sometimes it might be unavoidable, hearing “I’m sorry you can’t do that now” over and over again is the difference between a frustrating experience and a great one. So how do you find out where your dead ends are?

Test with Real Users. Test With Real Users.

So important it has to be said twice. One of the things we did early on, was to start asking real people what they wanted out of an Alexa Expedia skill. This didn’t require any code, we just put an Echo in front of people and told them to say “Ask Expedia…” and then fill in the blanks with the travel questions they had. These learnings not only fueled our feature backlog but also formed the beginnings of our sample utterances.

We did this providing minimal instructions to the testers; we found if you give people a list of instructions and features, they tend to follow your script much more carefully. Real users won’t discover the skill’s features that way, so don’t color their expectations. We wanted to discover all the stumbles and mistakes that people made naturally along the way, and what they expected from the skill. Every time we modified the voice user interface we were putting it in front of real users to test with and observing how they interacted with it.

We also got good results from actually pretending to be an Alexa app. Again, put the Echo in front of someone for context and have them start interacting with your skill, and the developer just reads the responses from your interface flowchart (or equivalent). This allows cycles much faster than you can get with actually writing code. It may feel silly, but the data you’ll gather is invaluable. At the end of the day, we’re building conversational interfaces so make sure to develop them through conversation.

The Architecture Itself

At Expedia we have a lot of AWS infrastructure already. We also have other forays into conversation interfaces, like our Facebook chat bot and other implementations coming soon. We wanted to let the Alexa-specific logic exist on its own in a Lambda, and move common functionality like logins and the code to answer common questions in a shared service (think APIs like “get my next trip” or “did I book a trip to X”). That became a backend service built on our standard EC2 instance pipeline (Primer, which you’ll be hearing more about from us), using DynamoDB as a data source. Going forward this will allow us to create rich experiences for lots of voice and conversation platform while not having to re-invent the basic functionality for every new device or API model.

Testing and Velocity

With any new platform there some basic logistics to work out. We’re pretty passionate about continuous integration and delivery, so we consider those as the first in the order of operations for a new platform. Our internal Primer tool allows us to create web applications with CI/CD pipelines, environments, and standard monitoring using Jenkins and CloudFormation with a few clicks, so we started there with a node web application and Lambda. To make the skill development itself a little easier, we used some early Alexa skill library code from Amazon as a starting point, but if you’re building in node you should take a look at the Alexa skills kit, which is very similar but newer and more full featured.

To shorten dev cycles for the lambda we just created a simple runner script. There are some modules out there that add some niceties, but conceptually running a lambda function is just a matter of a small node command line script that sets up your context and invokes your handler. This gives you a way to debug and lets you test changes without deployment. It’s as simple as something like this:

[javascript]

var context = {
succeed: function(result) {
console.log(‘Success:’, result);
},
fail: function(failure) {
console.log(‘Failure:’, failure);
console.log(‘Stack:’, failure.stack);
}
};

var event = process.argv[1] || require(‘./samples/default-intent.json’);
event.session.application.applicationId = ‘yourappid’;

index.handler(event, context);
[/javascript]

This is a simplified variant, but it enables you to just run node run.js ./samples/launch-intent.json for example and execute a JSON intent response delivered from Amazon (you can retrieve the event JSON from the Lambda console tester, and that’s how we built our library of responses). This let us functionally test the whole Lambda and back-end connection without constant deployments, and mocha unit tests built into the pipeline covered the individual features. You can get some similar tips with different frameworks from Kevin Epstein’s lightning talk from re:Invent on Monday (slides). That talk rang true for us.

We versioned our intent schema and sample utterances files by storing them in the repository (for now…). We’re hoping Amazon adds APIs to enable direct uploading of the those artifacts in the future, because the least fun part of the pipeline is remembering to manually upload those through the browser (we’ll probably script this if there’s not an API solution on the horizon).

Conclusion

It was an exciting ride from mid-August to our October submission. You’ll be hearing more from our team about our voice developments in the future as we’re very excited about where this technology can go. Right now the more natural and human you want your assistant to appear, the more work you have to do. However already the APIs are moving forward; just this week the new custom models and intents in the Amazon LEX service seem like they will take away a lot of the drudgery in getting data from users, as well as the new pre-built Alexa intent library. We’ll continue to share our insights we glean from building voice assistants here, and we hope you enjoy the skill the team put together. If you have feedback or questions about the dev process, get in touch with us on Twitter via @ExpediaEng or leave comments below.

[caption id=”attachment_285" align=”alignnone” width=”300"]

A photo of the Expedia Alexa team with Amazon Alexa VP Mike George

Some of the Expedia Alexa team with Tony Donohoe and Amazon Alexa VP Mike George[/caption]