Conversations in Skyrim VR

I’m standing just outside the gates of Whiterun in a snowstorm, huddled with a guard around the bright glow of a fire, and even though I know he can’t hear me I’m trying to have a conversation with him.

Against my better judgment, my partner has talked me into trying Skyrim VR. I loved Skyrim, but have complex feelings about VR: I’m all about being immersed in rich, explorable environments, but also get nauseous very easily, so most VR games with motion are non-starters. (The one exception for me to date has been The Climb; for some reason, pulling myself through a world with my hands feels completely fine.) So I ask my partner at first to just curate me some pretty viewpoints to stand at and look around. After a while, though, I risk trying the move-by-teleport scheme, and can handle that as long as I don’t have to do too many jumps in quick succession. (Basically, as soon as I aggro something it’s all over.)

This guy.

I’m mostly curious about what character interaction feels like, anyway, and after teleporting around the wilderness for a while, I approach civilization and this guard is the first NPC I come across. Like all VR characters, he has a surprising presence that’s not really about visual fidelity at all, but proprioception. I alternate between a deep, ingrained feeling that there’s actually a real person standing there, and a puckish urge to stick my hand through his chest or invade his personal space to study his face textures. The uncanny valley, for me, is entirely in his lack of reactions to these things: it’s not about his poly count or bump mapping at all.

Counterplay aside, I mostly want to pretend he’s real, to feel immersed in Tamriel. So we have a conversation, even though he’s a lowly background NPC with no unique dialogue of his own. Since I’m standing within his invisible proximity sphere, every ten seconds or so he spits out a random bark available to characters of his type: I don’t know whether he’s a MaleSoldier or a MaleEvenToned or a MaleCommoner or what, but his pool of lines includes rumors about dragons, opinions on the war, and general chit-chat. The silence between each line gives me enough time to say something back. I know he can’t actually hear me, and of course the next thing he says doesn’t follow on from what I said, or even from what he was saying a moment ago. But it does feel sort of like a conversation. An awkward one between two strangers killing time, sure, but still. He tells me the “arrow to the knee” story. I act like I’ve never heard it before, expressing sympathy: it feels rude to acknowledge it’s a cliche. “Glad you could find work here as a town guard after that,” I say, and try to interpret his silence as companionable assent.

Later, I talk to more fleshed-out NPCs. My partner’s got the Dragonborn Speaks Naturally mod installed, meaning I can just read the text of a dialogue option aloud to select it. This is less satisfying than I hope it will be. The freedom of being able to perform the given lines however I want (sarcastic; earnest; skeptical) conversely makes being limited to a couple of dialogue options feel more frustrating and artificial. I try ad-libbing some lines, changing words or adding codas, but this risks confusing the speech recognition and mis-selecting my response. I inadvertently insult someone this way whom I’d been building up a rapport with, and have no way to apologize, further breaking immersion already challenged by a floating conversation menu that often slices straight through the speaker’s head.

I miss social cues. Smiling at someone to indicate I’m interested in what they’re saying, having them smile back. Making hand gestures for emphasis or to point at something: What’s that statue up there? is what I want to ask the elf lady who won’t let me into the mage’s college. I want to put my hands on her shoulders and look her in the eyes and say “Please, please let me in” and have that work because I’m being so sincere — or at least have her be offended I invaded her personal space and tell me to get the hell out. I do not want to spend six minutes trying and failing to navigate an awful series of menus to cast the correct spell to impress her, continually pressing the wrong button four or five submenus in and having to start all over, but this is in fact what I have to do. This game was not designed for VR.

This is even more obvious in the lack of physical interactions generally, despite a few (mostly combat-based) exceptions. VR-native games generally give you lots of interesting objects to pick up, turn over, and throw at things. As delightful as this would be in a tavern filled with Skooma bottles and cheese wheels, it just isn’t possible, which is a shame. I want to pull books off shelves, pick up a vase to study its lovely textures and reflections, open drawers, and, yeah okay, throw cheese wheels. I want to pluck alchemy ingredients off plants, mix them into potions with hand gestures, then lift those potions to my mouth to drink them. I want to pet the horses. I really, really do. I’m not even into petting horses in real life but for some reason I really want the Skyrim ones to notice I’m petting them and be appreciative.

VR does physicality well, and I naturally start thinking how this could be applied to social interactions. In addition to body language (standing too close or walking away is rude; making eye contact shows interest) what if I could select basic conversational moves, Mass Effect style, by pointing at them, while verbally performing the line any way I want to? Telling the merchant “Oh, what a great deal,” while pointing at Sarcastic or Flatter. Heck, bring back Oblivion’s Speechcraft minigame but make it tactile, combinatorial, performative. That’d be awesome.

Imagine this as a compass-sized device you held in your hands and deftly twiddled while speaking to an NPC. Screenshot courtesy The Unofficial Elder Scrolls Pages.

We can get beyond these pretty straightforward ideas without much more trouble. You may have noticed that speech recognition has gotten pretty good lately, and so has detecting intentions or emotional affect from that speech. At Spirit AI we’re working on making some of this stuff available to game designers via Character Engine. Naturally Speaking Dragonborn would be freaking awesome if I could improvise more around each line and the game could still recognize the conceptual similarity between what I said and one of the available options. Character Engine can actually take this further and support free-form conversations not arranged in pre-structured dialogue trees, or instead make this kind of expressive performativity available through dynamically generated menu options that change based on emotions and conversation topics. (My colleague Emily Short recently released a small interactive ghost story showcasing this.)

Screenshot from Restless, courtesy Emily Short.

But even small steps here would be radical improvements. The smallest hints that the characters were paying attention to me, noticing me, responding to me, would be massive wins. The smallest ability to socially perform would be greatly welcomed. I’m so invested. I want to believe in this world. I just need a way to do it and a little bit back.

Meanwhile in VR Tamriel, I’ve found someone more interesting to talk to. Blowing off my first assignment as a student at the mage school (surprise! It’s go to a dungeon and retrieve something) I’ve stayed behind and met a socially awkward orc who I pretend is also playing hooky. (She may be from one of the various NPC-enhancing mods in my partner’s formidable list; I’m not sure.) Basically, she’s a cryptomage: she believes there are all kinds of rare, unexplored magics out there, and she wants to find them all and research them. She burbles on at length about her theories and the many fantastical inventions and schemes she’s working on. She’s exactly the kind of creative weirdo intellectual I tend to be drawn to in real life, and I find myself really wanting to be friends with her. There is no way to express this to the game. I start performing my dialog options more genuinely, wanting her to hear the enthusiasm in my tone of voice, but she can’t (although the tech to do this also already exists.) The only way I can really indicate enthusiasm is to exhaust all the options in her dialogue tree, so I do that. I tell her I hope to see her around (off-script) and wave goodbye, even though neither of these things are being tracked by the game. But it feels weird not to say goodbye to a new friend.

Character interaction in VR and AR has so much untapped potential, but encouragingly, very little of it relies on technical pipe dreams. Much as there are few technical reasons that Myst couldn’t have been a CD-ROM launch title in 1988, there’s few technical reasons we can’t have dynamic conversational NPCs right now, in mixed reality or on screens. What takes time, as with any new medium, is drawing the right people to the platform, exploring the new affordances it offers, and experimenting and iterating to find what really works in each new design space. I hope the first mind-blowing conversational VR game comes out soon, in 2019 or 2020. Until then, my relationship with my weird friend the cryptomage will have to keep existing mostly in my head.