Interactive Fiction Actions (Part 1)

Google launched Actions on Google to allow you to write your own conversational actions for the Google Assistant, which users can invoke on devices such as Google Home.

A popular experience for these kinds of devices are interactive stories that are similar to text adventure games. I have fond memories of spending hours playing these games. As a big fan of these games, I couldn’t pass on the opportunity to build them as actions and show you how to easily port one of the thousands of existing text adventures to Actions on Google.

Background

Interactive Fiction (IF) grew out of early text adventures, which were some of the very first computer games ever created. IF is still thriving through a community of enthusiasts who are actively creating new bodies of work. There are thousands of stories to use as starting points at sites like IFDB and The Interactive Fiction Archive.

Text adventures use command-line interactions where the game provides descriptions of rooms or locations and the user navigates and interacts with the game by using common commands like “look”, “go north”, “open door”, “take treasure” and “inventory”.

IF games and stories are typically written in C-like languages and saved in formats like Z-code and Glulx. Popular tools to create IF include TADS and Inform.

The games are rendered by interpreters or virtual machines that run as native desktop apps, mobile apps, or even web pages. For Actions on Google, we’ll need an interpreter that runs Javascript because the Actions SDK client library is written for Node.js.

Porting to Actions on Google

Porting Interactive Fiction to Actions on Google is relatively straightforward, because we rely on existing IF interpreters to do most of the work. IF interpreters typically display text and format the command-line UI, so it’s possible to extract the necessary text for the Text-To-Speech (TTS) for actions. Users can then interact with the game using their voices.

To begin, you’ll need the following components:

An interpreter that uses Javascript— I chose Parchment, an IF player for the web that has open-sourced JavaScript interpreters for popular IF formats, including z-code. Parchment uses ZVM, the ifvms.js Z-Machine, and adds its own API. To understand how to instantiate the interpreter, start with their basic bootstrap app. It involves loading the z-code formatted file into the interpreter and then using a loop or runner to keep processing codes and data provided by the interpreter. However, to fully support the rendering of the game, Parchment’s UI uses StructIO, which covers most of the structural codes that need to be rendered for these kinds of games.

A game file — Let’s start with something easy and fun: “Lost Pig” by Admiral Jota. It even comes with a solution if you get lost.

A Conversation Action built with Actions SDK — The action extracts the text from the interpreter and reads it back to users with a Node.js fulfillment app. It also provides the interpreter with input from the user. Since interpreters accept text input and the Actions SDK provides the raw text input of what the user said, this is the the best way to start development.

Build the Action

For my action, I’ll use the Actions SDK sample as a starting point, and instantiate the ZVM interpreter in the Node.js app instance:

engine = new ZVM();
loadData(
'http://mirror.ifarchive.org/if-archive/games/zcode/LostPig.z8', (data) => {
  self.sendInput({
code: 'load',
data: data
});
  try {
engine.restart();
} catch (e) {
assistant.tell('Error: File format not supported.');
return;
}
engine.run();
});

To speak the adventure text, I extract the command-line formatted data from the interpreter and strip away any control characters such as line feeds and special chars. The incoming raw text of what the user said is passed directly to the interpreter.

step (response) {
const orders = engine.orders;
  for (i = 0; i < orders.length; i++) {
let code = orders[i].code;
if (code === 'stream') {
...
} else if (code === 'read') {
...
order.response = response;
engine.inputEvent(order);
...
}
}
}

I then preview the action with the gactions tool:

./gactions preview --invocation_name "voice adventures"

The action is now available on the web simulator, or my Google Home device. I can also use the gactions simulator to play the game:

./gactions simulate
User TTS (CTRL-C to stop):
talk to voice adventures
Action: Sure, here's voice adventures
Lost Pig. Grunk think that pig probably go this way. It hard to tell at night time, because moon not bright as sun. There forest to east and north. It even darker there, and Grunk hear lots of strange animal. West of Grunk, there big field with little stone wall. Farm back to south.
User TTS (CTRL-C to stop):
inventory
Action: Grunk have: torch (on fire), pants (Grunk wearing them)

Retaining State

The action works fine for a single user at a time. However, I need to give users their own instance of the running interpreter to keep track of where they are in their own story. Also, actions are invoked in a HTTP request/response flow, so I need to keep track of the current state of the game for each subsequent request.

Most interpreters define some sort of format to retain a game’s state. The ZVM interpreter supports a “save” command to save the current game state and a “restore” command to restore the saved state. The state is stored as an array of numbers.

To save and restore the state across multiple requests, I use a convenient data utility provided by the Node.js client library to store values between requests within the same session:

assistant.data.restore = this.data;

For each incoming request to the action:

  1. Create a new instance of the interpreter using a cached copy of the game file.
  2. Tell the interpreter to restore the game state using the incoming session data.
  3. Pass the raw user text to the interpreter.
  4. Parse the interpreter response and pass it back in the action HTTP response using assistant.ask to get the next user input. In addition, the game state is saved and passed back with the HTTP response.

Next Steps

The Actions SDK version of the interpreter now works great for multiple users trying out the same hosted action. However, it became clear to me that the spoken commands weren’t always being translated correctly and had to be repeated before the command text was correct. Also, the user has to talk like a computer: “go west”, “look”, “push can against wall”, “go up”, “press play” to play the game properly.

I wanted to have better voice recognition and also wanted to talk more naturally when playing the game to comply with our VUI principles of unlocking the power of spoken language. In the second part of the series about this project, I’ll explain how we can improve the Natural Language Understanding (NLU) by using API.AI.