Creating stories using Amazon’s Alexa Interactive Adventure Game Tool

13 min readJan 16, 2018

I recently created the Kitten Cafe Alexa skill using Amazon’s Interactive Adventure Game Tool, but I found the documentation a bit sparse so I wanted to write down my notes on how the tool works, and some of the things I discovered along the way. The sample story I’ll be referencing from here on is a fork of the BBC’s fork of the Amazon tool, containing some new features and a few bug fixes, and it’s available here:

https://github.com/james3burke/interactive-adventure-game-tool

If you just want to jump in to playing with tool then you can just clone the above repository, run “npm install” and then “npm start” to launch the tool, but if you need more detail than that I have a previous article with some tips.

Introduction

The Interactive Adventure Game Tool allows you to build branching narratives as a tree of scenes within a UI

The two main elements of the story tree that you can see in the UI are “Scenes” and “Utterances”. At its most basic a scene is simply a block of text that Alexa will read out, along with some configuration details. An utterance is a phrase that Alexa will then listen for in between scenes.

“Open The Red Door” is an utterance, “Red Room” is a scene

Scenes

Scenes are created by clicking the plus symbol underneath an existing scene (or on the “START” circle at the top of the story). Clicking the plus will create the new scene as a child of the existing scene. By creating a new scene you automatically create a new utterance box above the new scene, which indicates the text the user must say to reach that part of the tree.

When you select a scene the sidebar will display the details of the scene that you can then edit. There are three sections in the sidebar: “Card”, “Voice” and “Advanced Configuration”. Cards are only visible in the Alexa app or on Alexa devices with displays. The title that put in the card section is the name of the scene that will be displayed in the UI.

The Voice section is where you get to write the text that Alexa will read out when it reaches your scene.

Within the voice section the “Speech” text is the main text that Alexa will read when you enter the scene. “Reprompt” is an optional piece of text that Alexa will read out if the user doesn’t say anything with 7 seconds of the speech being read out. It’s optional because if you leave it blank the tool will simply generate some text to tell the user what options are available based on the story tree. For example, If a user reached the “Red Room” scene, heard the main block of speech but then didn’t say anything the tool would then prompt them with “You can say ‘open the box’ or ‘go back” because those are the only two options that are available in that scene.

My fork of the code adds two extra text options, both of which are optional. These have been added so that when you link back to an existing scene you have slightly more control over the speech you want delivered. “Reentry” text is read out when you return to a scene that have already visited once. You would normally use this option if a scene is a central point in your story that you might return to several times, but where you want to avoid constantly repeating the same lengthy piece of description. Similarly the “Exit” text allows you to specify a piece of text to say after the next utterance, and before the next scene’s speech. In the “Hidden Room” example this gives you slightly more description about how you return to the corridor

When you say ‘go back’ from the hidden room Alexa will say “You walk out, closing the door behind you. You are back in the corridor”, combining the exit speech from the hidden room with the reenty speech from the Corridor.

The “Advanced Configuration” section allows you to change the configuration of a scene to control the flow of the story.

Hidden Scene

Hidden scenes are the same as non-hidden scenes, except they are displayed with a dotted line around them, and won’t be added to the list of reprompt options for their parent scene. This allows you to have hidden paths within your story that can choose how you reveal, rather than having the tool make them immediately obvious to the user as part of a reprompt. For example the “Hidden Room” in the sample is hidden from the user, unless they invoke the “Look Around” utterance to gather some more information.

Leaf Scene (“Prompt with Previous Scene’s Options”)

Leaf scenes are childless scenes that will return you to their parent scene after their voice speech is delivered, and so you have the same options as at the parent level. After you invoke the “Look Around” scene in the sample its speech is read out, but the options that are available to you are the same as being in the corridor. It doesn’t make any sense for leaf scenes to have any children because there’s no way your story could reach them.

This type of scene is useful if you want to deliver some additional description in your story that might make your main description too long, or if you want to place some red herrings amongst the options that a user can take.

End Scene

End scenes mark the point at which your story/skill ends. When your story reaches them their speech will be read out, and then your skill will terminate. There’s no point adding reprompt text or child scenes to an end point.

End scenes do not cause the Global Command Response “Exit” text to be read out (see Commands below).

I haven’t mentioned state up to this point, but when your skill is running it will record the path that each user takes through your story as they go, so if they leave without completing they can come back to the same point on their next visit. End Scenes reset that state, so on your next visit you will simply start from the beginning.

Linked Scene

Using the code from the BBC fork it is also possible to have child options of your current scene that link back to an existing scene in another part of the story tree. This is useful if you want to have several branches in your story all join back together. To create a linked child scene underneath your current scene click on the small arrow icon next to the plus icon at the bottom of the scene. You can then click on the other scene that you want to link to. Note that you must still specify an utterance that will take you down that path, but that it doesn’t have to be the same utterance that already exists on the linked scene.

The “Instructions” scene links to the “Corridor” scene

Commands

As well as the utterances and scenes that you can create as part of your story the tool comes predefined with a number of global utterances and responses that can be used at any point in your story. For example a user can say “help” at any point in the story where the skill is waiting for an utterance, and the tool will then read out a general response that you can edit if you wish. To view these global commands click on the home icon in the top corner of the sidebar (The house in front of “alexa adventure maker”.

In the Global Commands Utterances section you can see the list of global commands, and also add or change the utterances that invoke them. For example, by default the “Repeat Scene” command is invoked by either of the utterances “repeat scene” or “repeat”. You can add to those utterances, or change them if they clash with something you need to say in your story.

Global Command Responses are a special type of scene that can be invoked by some of the global command utterances.

If you click on one of these global responses you are taken to the normal scene sidebar, where you can set the card and voice information as normal. Changing any of the advanced configuration will probably not have an affect because of the way the tool works. If you wish to change the speech that the tool will read out when a user asks for help then this is where you would do it.

The “help” command is useful if you need explain how people should use your skill, and give examples.

The “stop” and “cancel” commands are useful if you want to give people a message as they leave your story, perhaps to encourage them to come back and use your skill again. Note that the message you write here will only be called when a user says stop or cancel, and is not connected to end scenes.

The remaining commands are largely concerned with restoring state if a customer returns to your skill, and allowing them to repeat the options or scene at their current location if they are unsure what to do at a point in the story. Because they largely result in the repeating the speech of an existing scene you can’t customise the text related to these commands.

One extra thing available as part of the “Global Command Responses” is the “Unrecognized” response, which has no utterance. The speech that is defined as part of this response will be read out if a user says something that doesn’t match an utterance defined in your story, and you can customise it if you wish.

Utterances

As mentioned before utterances are simply pieces of text that Alexa is listening out for in between scenes. Global utterances will work whenever a user says them, but utterances that are connected to scenes will only invoke their attached scene if it is valid in the current context of the story. For example if you try to say “open green door” while you are inside the “Red Room” then the tool will read out the “Unrecognized” response because that is not a valid option for that point in the story (even though it is a valid utterance for the skill as a whole).

You can however reuse the text of an utterance as often as you want, so in the sample story we have multiple instances of “go back”, and so that utterance is valid in all of those scenes. As it happens the sample story uses each of those “go back” utterances in the same way, but there would be no problem with each of them leading to completely separate scenes if we wanted them to.

Utterances can have multiple phrases defined for them. For example if you click on the utterance box for “open the red door” in the UI you will actually see the phrases “open the red door”, “open red door”, “the red door” and “red door”, all of which are treated as the same utterance.

You can define as many of these alternate phrases as you like, with each one on a new line. As a technical note it is the first line that is used to generate the name of the Intent that these utterances would invoke (i.e. OpenTheRedDoorIntent).

The reason for defining multiple phrases for the utterances is to allow Amazon to more accurately match what a user is saying to the intents that you are expecting. This means that you should try to define every utterance that you expect a user to say, and if the user says something that you weren’t expecting the tool will read the “Unrecognized” response speech.

The exception to this is when you set the “Default Option” parameter in the advanced configuration of a scene.

If you do this then a user can say anything, even utterances that you haven’t planned for, and the story will then progress as if they said the utterance that you have set as default. This is useful if you want to give the user the illusion of being able to say anything and still have your skill react to it. The downside of this approach is that if you have other options as well as the default one, then even a slight mishearing of the what the user actually said will still cause the default option to kick in. Unless handled carefully this can make your skill look like it’s just ignoring the user.

I said before that you can reuse utterances at multiple points within your story, and that the first line of your utterance is the one that the tool will use to generate the name of an intent. This leads to side effects that are both useful and somewhat dangerous if not handled properly. For example, a dangerous side effect is that if you use the same phrase within two separate utterances that phrase will only ever work for them while you are testing, and will caught as problem when you try to submit your skill for certification. For example, if I tried to add the phrase “open door” to both the red room and green room from the sample like this:

Then the tool would generate the following Sample Utterances for you to upload to the Alexa configuration

OpenTheRedDoorIntent open the red door
OpenTheRedDoorIntent open red door
OpenTheRedDoorIntent the red door
OpenTheRedDoorIntent red door
OpenTheRedDoorIntent open the door
OpenTheGreenDoorIntent open the green door
OpenTheGreenDoorIntent open green door
OpenTheGreenDoorIntent the green door
OpenTheGreenDoorIntent green door
OpenTheGreenDoorIntent open the door

Because of the way alexa uses utterances and intents this isn’t valid, and the best you can hope for is that the first definition is the one that would work. (Actually my fork of the code would warn you about this in the console and ignore the second version of the utterance). This problem can actually get much worse if you inadvertently use the same first line of an utterance in more than place, but with different alternate phrases. For example:

Now you end up with a single intent, that confuses opening a door with opening a box.

OpenIntent open
OpenIntent open the red door
OpenIntent open red door
OpenIntent the red door
OpenIntent red door
OpenIntent open the box
OpenIntent open box

This can easily cause bugs if you do it by accident, because it might drag in utterances to the current scene that you weren’t expecting, and they could clash with an actual option you wanted available, for example if there was another door in the room.

On the other hand, if you are careful about how you do this it becomes an easy way to reuse a set of phrases that you might need frequent access to. In the sample story I specifically use the utterances “positive” and “negative” instead of “yes” and “no”, and those first lists of phrases for those utterances will now be available any other time I use the utterance “positive” or “negative”

This gives me a single point at which I can update all of the positive words I want to use within the story.

You should also bear that mind utterances that are connected to commands, for example “help”, will always override utterances that you specify within the story.

Troubleshooting

A few quick gotchas that it might help to be aware of.

Utterances must be defined in lowercase, and with letters only. No exclamation marks!

If you create a very large story then at some point the UI will start drawing the tree beyond the left hand side of the browser window, and the scrolling will stop working. The simplest way to avoid this when it happens is to make your browser extremely wide before you launch the tool. You can do this by dragging the whole browser off to the left of your screen, dragging the right hand side of your browser to make it larger and then dragging the browser back off the screen and so on. Once the tool is open you can make your browser smaller again. It’s annoying so hopefully it will get fixed at some point.

A bug that consistently trips me up happens when you delete everything in an existing story so you just have the “Start” circle. At this point when you start adding child scenes to the “Start” those children will often have their “End Scene” flag set to true. Annoyingly you can’t then set that flag to false for those screens in the UI, you need to manually edit the “src/skills/models/scenes.json” file. If you forget to do it then your story will abruptly end while you’re testing it with no indication that it has stopped!

Hopefully this is a useful introduction to some of the finer points of using the tool. I’ll add another article in the future about how the underlying code works at runtime.