I created an escape-the-room Action for the Google Assistant called “Magnificent Escape”. A few interesting issues were uncovered during the development and, in this post, I’ll discuss some solutions that you could use in your own Actions.
The game follows the classic escape-the-room format, but I wanted to make it work as a conversational experience. All the Actions on Google surfaces are supported, but the best experience is playing the game by voice.
The Action is implemented in Dialogflow and the fulfillment is implemented in Node.js using the Actions on Google client library. The fulfillment is about 5000 lines of code and is hosted on App Engine. I chose App Engine since it gives me better control over scaling and also supports multiple files as opposed to the Dialogflow inline editor which only supports a single file. The fulfillment logic uses session storage and user storage for all the game state.
For devices with screens, various rich responses like lists, suggestion chips, basic cards, and images are used for devices with screens.
The Dialogflow agent has 96 intents and 11 entities.
Even though Actions on Google hosts a free sound library for developers, I couldn’t find the right kind of sounds for my game. So, I purchased royalty free audio for the game:
The audio was quite affordable; many were just a few dollars each. The game has 36 sounds including sounds for earning hints, invalid moves, finding an easter egg and escaping the rooms.
Each of the sounds and music was adjusted to the recommended loudness levels. The loudness of the sounds and music was also adjusted so they sounded good on all surfaces (some surfaces, like in a car, would clip the sound effects or make the audio sound muffled). All of the sounds are hosted in Google Cloud storage.
Since I wanted the game to work well with voice, it was important to get the VUI prompts right. This involved lots of experimentation with (and revisions of) the various prompts.
Here is a sample dialog for a returning user:
USER: Hey Google, talk to magnificent escape.
ACTION: Welcome back to Magnificent Escape. At the moment you’re in the lobby. There are 3 rooms: Office, Bedroom, and Garage. Which one would you like?
USER: how about the bedroom
ACTION: You teleport into the bedroom. Start by taking a good look around. The room has 4 walls: north, south, east, and west. You can also look up or down. Which direction do you want to look?
The Action has 245 different prompt types and 445 different prompts. Fine-tuning the prompts involved these steps:
- Write sample dialogs for all the typical use cases for happy paths and error paths
- Say the dialogs and edit the prompts until it is as clear as possible
- Test the prompt in the Actions console audio tab to hear how the TTS voice would sound
- Adjust the prompts for any TTS pronunciation issues
- Adjust the prompts for any pacing issues (usually adding SSML <prosody> and <break> tags)
- Test the prompts on various surfaces including mobile, TV, auto, and Google Home
I kept all of the versions of the Action in beta before publishing to production. I chose beta because I wanted to have control over when the Action goes live to production. This would let me control the timing of any PR and also ensure I would be ready to keep an eye on the Action when it went live. Another option to consider is to first use an alpha version of the Action to verify the design before going for a review.
I’m part of the Google Assistant developer community program, which includes accelerated reviews. Most of the submissions for reviews were approved in about a day.
I deployed the Action from beta to production on Oct 25 and received an email that the Action was being deployed. After receiving the email, it took about 2 hours before the status in the Action console was updated to fully deployed.
On devices with screens, I display a suggestion chip to rate the Action after a user has completed a room:
The request for rating is implemented with a Basic card with a button that loads the URL to the Assistant directory page for the Action. I keep track of if the user launched this screen and, if they did, don’t prompt for it again.
Study your users
Studying the fulfillment logs and using the Dialogflow History feature is extremely valuable in helping to understand how users experience the gameplay (I wrote a blog post about this). This is a critical part of the post-launch phase and has resulted in significant improvements to the user experience.
One of the first issues I noticed in the history logs was conversations where the Dialogflow agent asked the same question over and over: “what is item1?”
It was the prompt provided by Dialogflow for slot filling. For the intents with parameters, I had made all of the parameters required. I had customized some of the prompts for slot filling but not all of them. This was a big mistake since the conversation with the user would go from carefully crafted VUI prompts provided by the fulfilling logic to occasionally handing off the user interaction to Dialogflow with its auto-generated prompts.
The quick fix was to customize the prompts for every single required parameter. However, that wasn’t enough. Dialogflow would randomly select a prompt if a parameter had multiple prompts. It is one of our main Actions design guidelines to ensure that sequential prompts vary and that the Action doesn’t say the same thing over and over. But, since Dialogflow selected the prompts randomly, it could select the same prompt over and over. A fix for this is to provide slot filling with fulfillment, which ensures that the prompts are not repeated.
However, if the user doesn’t provide input that matches the required parameter type, Dialogflow would keep prompting the user. Since I wanted to keep the gameplay open and let the users change their mind at any time, I needed something more flexible. Therefore, I changed all of the parameters so they were not required and then implemented my own custom slot filling logic in fulfillment with dynamic contexts. The slot filling logic provides a similar error handling experience as no-input and no-match — 2 attempts at recovery, and then punting to a fallback prompt to get the user to try something else.
The Dialogflow agent is designed with lots of American spelling, phrases, and slang. Occasionally, I would find intents that don’t match since they contained words from different locales (for example, “torch” instead of “flashlight”). I go through the logs every day and add additional training phrases for the Dialogflow intents and synonyms for the entity values to increase the vocabulary for intent matching.
Dialogflow does have a feature to automatically expand on entity values provided by developers, but in my game I got unexpected matches. Your mileage might vary with your Action, so it’s worth a try.
Users often say “OK Google…” in the Action. This is typically to stop the game or to invoke another Action. This happens so often that I’ve added logic to the fulfillment fallback to close the Action for the user with a message like, “Looks like you want to talk to the Assistant. Let’s end it here and you can try asking the Assistant again.”
As part of the game puzzles, the user has to provide a code to open a combination lock. The Dialogflow agent has an intent with a parameter type of ‘@sys.number-sequence’. However, occasionally the user input isn’t interpreted correctly. Here are some examples of misinterpreted input:
“4 to 90”
“1 to 3/8”
“4 to 3 Wan”
I created my own parser of the raw user input in the fulfillment fallback intent to translate these into number sequences.
In the lobby of the game, the user is presented with a list of rooms to choose from. However, the Actions.intent.OPTION helper intent doesn’t handle all the various ways the user can select an option. For example, the user might say, “let’s try room 3” or “let’s try the middle one”. I’ve added several additional intents to handle all the variations found in the Dialogflow logs (see my blog post on this issue).
I use 3 analytics tools (see my blog post on analytics):
- Actions console
- Google Analytics
Chatbase has a list of all the user messages not handled. This is useful for adding new phrases to the Dialogflow agent intents.
Chatbase also has a useful feature to track the user interactions by surface. I’ve created these labels:
- Audio only
- Screen with keyboard
- Screen with voice
Most users are playing the game by voice, followed by keyboard on screens and then voice on screens.
Google Analytics has a graph that shows the users by time of day. Most users are playing the game in the very early mornings, which implies most of the users are in another timezone (I’m in PST).
The Google Analytics Measurement Protocol is very useful to track events. I use it in the game to track the items examined, time spent winning a room, and the number of turns in the conversation. From this, I can tell that 38 users have escaped the bedroom so far and that some users took just over an hour to win. The longest session was 2.5 hours for escaping the most difficult room.
Give users more time
One of the requirements to pass the Actions review for publication is that the Action is not allowed to keep the microphone open without the user knowing what to do next. This is typically solved by asking the user a question. Sometimes, however, the user needs more time to think about the next step. To support that, I use a Media Response to play some music when the user asks for more time. During the conversation, the Action will drop hints about the feature: “You see a desk against the wall. By the way, if you’re not ready, you can ask for more time. What will you do next?”
When the user is ready, she can just say “OK, Google” and then provide the next step for exploring the room to continue with the session.
The game implements the ‘Play Game’ built-in intent. It’s the largest source of users for the game based on the Actions console analytics.
Initially, I had the ‘Play Game’ intent configured to be the same as the default welcome intent. However, since the users don’t invoke the Action explicitly with “talk to magnificent escape”, they might not know much about the game. So, instead of just doing the normal welcome message, these users might need a special introduction.
As an experiment, I added a new pre-welcome message:
“<speak><emphasis level=”strong”>Here’s the top pick, just for you!</emphasis></speak>”
Also, when the user quits, they get a special goodbye message that tells them how to invoke the Action again: “OK. To play again, just say ‘talk to magnificent escape’. Let’s try this again later.”
It’s not clear yet if this makes a difference. I’m doing A/B testing with Google Analytics to track the user behavior and will report back on this in a future post.
Based on analytics, users are successfully escaping rooms. Some users spend hours trying to solve the puzzles. Interestingly, once a user wins a room, they typically go straight to the next room. It’s good to see that the game mechanics are working and that die-hard escape-the-room fans are actually completing the rooms.
I’ve been working on this Action on and off for several months in my own time. Overall, it probably would have taken about a month if I worked on it full-time.
The game still has lots of opportunity to improve, especially with increasing retention. Some things I’m considering adding are additional rooms like digital purchases and doing more A/B testing to improve the gameplay.
I’ll be spending many more hours looking at the logs to learn from the players how the game can be improved.
Want more? Head over to the Actions on Google community to discuss Actions with other developers. Join the Actions on Google developer community program and you could earn a $200 monthly Google Cloud credit and an Assistant t-shirt when you publish your first app.
Edit: Read how Magnificent Escape was updated after its release.