Designing Playable Conversational Spaces
My colleagues and I at Spirit AI have been blogging for the past few months about the unique possibilities and challenges of authoring for Character Engine. While the system supports multiple possible models of input, including gestures or dynamic menus, one mode supports full natural language input. Handling this kind of input is obviously a challenge, but not necessarily for the reasons you might think.
We’ve written previously about how our system works, and the technical side of natural language processing obviously presents a number of challenges. But by creating an entirely new paradigm for interacting with conversational characters, we also have a big design challenge as well. Just as the real point of a physics-driven 3D world is not to be an accurate simulation of space and motion but to be a fun sandbox for players to explore, the real point of a conversational game is not to pass the Turing test, but to guide players through a rewarding conversation telling a story. We’re only starting to come to grips with how to craft a well-designed conversational space, and I’d like to share some of the lessons we’ve learned so far.
The first is setting expectations. Put someone in front of a bot with a conversational interface, and invariably the first thing they’ll try to do is stump it. I suspect this is basic psychology, driven by our inherent suspicion of anything pretending to be human. So the key is to use some basic psychology back: rather than drop the players into a completely free-form and open-ended conversation, it’s crucial to prepare them — train them, as we would when introducing any other kind of new game system — how to engage with this character in a productive and useful way.
This is similar to a design lesson I learned from my years writing text adventures: while a room description might seem on the surface like a standard bit of descriptive prose, in a well-designed game it’s actually crafted with incredible care to guide the player toward the nouns and verbs that will be useful to them and to avoid the ones that won’t. Similarly, in a natural language game, what your characters say, especially in their first few interactions, is vital to setting expectations. Consider these two possible opening lines:
Alice: Hello, and welcome to Hotel Jabberwock! I’m Alice. Please let me know how I can help you.
Alice: Welcome to Hotel Jabberwock, east Wonderland’s premiere resort! I hope the mome raths aren’t bothering you. You’re checking in to our VIP suite, yes?
The second opening line is much better. It sets the stage and provides a familiar social interaction (checking into a hotel) which the player will feel subtle pressure to play along with. It suggests by mentioning some specific things in the world that it’s okay to ask questions about them. It implies a specific role for the player to inhabit in this scenario (rich tourist). Most importantly, it suggests several obvious follow-ups which authors can be sure they’re covering (saying “yes” or “I don’t have a reservation”; asking “What the heck are mome raths?” or “What else is in east Wonderland?” or “Is there a west Wonderland?”).
Other useful ways to set expectations in NPC lines include:
- Suggest (directly or indirectly) a response.
- Ask a question, ideally one with a tractable set of likely answers.
- Mention topics or bits of the world you want characters to explore
- Avoid stalling out (leaving the player hanging).
The last point is an important one. Replying with something that doesn’t give the player anything to work with (“I see.” “No problem!”) or simply answering a question (“Charles works for Valtrox”) doesn’t do much to help the player continue the conversation. Character Engine supports a useful system where certain bits of text can be tagged End Thought or New Thought. If a character speaks an End and can find a valid New to transition to, they try to do so. This means we can tag generic fallbacks or flat answers with End Thought, and write a series of contextual New Thoughts that keep giving the player new prompts, and lead the conversation back into terrain we’ve got content for:
“No problem [End]. So how’s your investigation going? [New]”
“Charles works for Valtrox [End]. He doesn’t get along with his boss Kim at all. [New]”
Despite these cues, players can of course still say anything they want at any time — that’s a fundamental aspect (but also a huge potential win) of any conversational interface. Since we’re not trying to pass the Turing Test, we don’t actually need or even want NPCs to handle completely off-topic statements (“What do you think of Aristotle’s metaphysics?”) But we do need to respond in some way to anything the player says. A good solution is to have a series of overlapping fallbacks that try to redirect the player back to productive zones of conversation.
For instance, say we have a number of facts about Alice in our knowledge model, but aren’t able to answer a particular question like “Why does Alice wear that old-fashioned dress?” In a well-designed project, we might have a fallback line keyed on recognizing QUESTION:WHY and Alice in the player’s input, maybe something like “Who knows why Alice does anything? [End]” This is a deflection from the fact that we didn’t really understand, but we disguise it by immediately moving on to a new subject.
It’s unlikely that we’ll have something useful for every combination of inputs, of course, so we can also write “coverage” of less-specific fallbacks.
QUESTION:WHY > “I have no idea why. [End]”
Any QUESTION > “Interesting question, but look: [End]”
Alice > “Did you know Alice was recently seen at the Queen’s court?”
We might also have fallbacks for specific scenes, or emotional or tension levels we’re tracking:
Scene: At The Court > “Hang on, is that a flamingo over there?!”
Tension: High > “There’s no time for that right now! [End] Cut the green wire before the bomb goes off! [New when bomb_state=ticking]”
All of these are whiffs, of course — a human could probably improvise something better — but the idea in this kind of failure case is to acknowledge in some way that you heard at least part of the player’s input, and move as quickly as possible back into familiar territory.
One final challenge which we didn’t anticipate before people started testing our games: experienced players would sometimes start interacting in single-word utterances, perhaps assuming the system is just looking for keywords. Ironically, because our system is designed to use so many other cues, this interaction style can make it fairly inept at accurately conversing, which reminds me of this scene from Young Frankenstein:
This is another design problem: how do we train players to speak the way we want them to? One technique is to make sure we’re responding to multiple channels of input. If we give a player a response that looks like it’s just reacting to a keyword:
Player: blah blah blah swordfish blah blah…
NPC: The swordfish is a predatory fish characterized by…
…players can’t be blamed for assuming everything other than that keyword is irrelevant. If, on the other hand, we let text recombine and vary based on multiple aspects of the player’s input and the ongoing state, more of that input starts seeming relevant:
Player: hey, where blah blah swordfish blah blah you jerk
NPC: Well there’s no need to be rude, but I do know the
swordfish often migrates in…
These design lessons continue to impact the way we push the technical capabilities of the system, which in turn provokes new design challenges and new derived wisdom. One direction we’re always striving toward is to help the user feel like they’re playing along with a conversation rather than playing against it. Making conversational characters technically possible doesn’t get you much if you don’t also design an interaction space that helps players have productive conversations, and the techniques discussed here — setting expectations, prompting behavior, and redirecting failure — are some small steps in that direction.