The Storyteller’s Guide to the Virtual Reality Audience

As VR storytellers, we are charged with molding experience itself into story, and none of our storytelling tools have prepared us fully for that. As we stumble our way into this new, mysterious medium, we ask ourselves, “How do we tell a story for the audience when the audience is present within it?”

Being bodily present in the story seeds the need to be active, to “do.” But how does the audience know what to do? And how do we take their needs and perspective into consideration? To even scratch the surface of these questions, we need to better understand the audience’s experience in VR — not just their experience of the technology, but the way that they understand story and their role within it.

Our early thoughts about what it means to be “bodily present” in the story. Photo: Katy Newton

To hone in on the audience’s perspective, our team applied a human-centered design (HCD) lens. We call this lens “audience experience” (AX).

Over ten weeks, we conducted 3 sets of experiments with over 40 participants and interviewed experts from multiple perspectives, from design-thinking, theatre, gaming, architecture, journalism, science, and film. Here are some of the most illuminating points from our research — the points that we think will be helpful to other mobile VR storytellers, as we navigate this new landscape together.

Over ten weeks, we conducted 3 sets of experiments with over 40 participants. All of our experiments were lo-fidelity and analog, allowing us to adapt and respond to new questions as they arose. Photos: Karin Soukup and Alexandra Garcia

The Media Experiments

To explore the audience’s experience in VR, we partnered with Stanford’s Media Experiments, the National Film Board of Canada, and independent filmmaker Paisley Smith. To anchor the testing, we used scenes and locations from Paisley Smith’s VR documentary, Taro’s World. The documentary explores the death of her Japanese exchange student brother, Taro, and the impact his suicide had on the people around him.Taro’s World will be released in 2016 for mobile VR — Google Cardboard and the Samsung Gear VR.

All of our experiments were low-fidelity and analog, allowing us to adapt and respond to new questions as they arose. Adapting a technique called “experience prototyping,” we created physical experiences in the real worldinvolving real people. These analog tests, rooted in an HCD approach, allowed us to learn about audience behavior quickly, inexpensively, and on a flexible timeline independent from that of VR developers and artists.

We mimicked the constraints of VR technology, restricting our participants’ movements and interactions to match the affordances of Google Cardboard. We created “magic goggles” (actually made of plastic, paper, tape and a front-facing camera) that limited the audience’s peripheral view while simultaneously recording their head movements.

When participants wore the magic goggles, their head movements replicated those of someone in a mobile VR headset, compelling them to “stitch” the scenes of the 360° story-world together themselves.

Two experience prototypes of a scene from Taro’s bedroom. Left: a lo-fi 3D model of Taro’s Bedroom using dollhouse furniture; Right: a live version the scene using props and theater blocking with actors and spatial sound. Photos: Karin Soukup

Audience Experience: Top Five Takeaways

Once the audience pokes a hole in reality, they have already fallen through it.

VR promises to create virtual worlds so real that the audience feels as if they are physically present in a digital space. That sensation of “being there” is called presence. Presence is partly achieved through the technology — the processing power, the graphics, the display, but it’s also achieved through the consistency and richness of the story-worlds we create. It’s up to us to convince the audience to suspend disbelief enough to feel present in mind, body and soul.

In the third experiment, participants were placed in the center of a room simulating Taro’s bedroom. While wearing headphones with 360° sound, they watched a scene play out in the room. In one condition of the test, the participant’s view was restricted to 90° of the room. These participants saw a desk where Taro sat playing video games.

In the third experiment, participants watch a scene play out in Taro’s bedroom wearing headphones with 360° sound. The participants were divided into three group with three varying degrees of restriction on what they could see. Photos: Alexandra Garcia

With this restricted vision, the audience paid super-close attention to each of the objects, trying to find meaning. Among the objects was a plate of uneaten cookies, an insignificant prop setting the scene. In the test debrief, the cookies — without warning — took over the audience’s understanding of the whole scene. Participants repeatedly asked questions like, “Why isn’t he eating his cookies?” and “Why are the cookies crumbled that way? Who does that?”

Because each object seemed deliberately placed, insignificant objects took on huge significance in the minds of the audience. This effect was prevalent in the 90° view, but we wonder if, even in 360°, having limited visual information puts more weight on the information that’s there, leading to a similar search for meaning. Once the audience fixated on the cookies, they retreated into their heads and began asking themselves questions — so many questions that in the debrief they described themselves as “detectives.”

Participants default to “detective mode” and spend extra energy trying to interpret the meaning of objects in the scene… even when there is none inherent to the object. Photos: Karin Soukup and Alexandra Garcia

There are three things to consider here:

  1. When the audience has limited visual information they will work twice as hard to make meaning out of every detail they see.
  2. If something doesn’t jive with their expectations, it takes them out of the experience.
  3. It sends them into detective mode, investigating the scene from a distance.

When you’re depicting the environments, ask yourself: “Is the story-world consistent and will the audience place the intended significance on the objects within it?”

There is no such thing as a neutral observer.

In our second test, some participants were seated in the front row of a classroom. While the goal of the scene was to observe Taro, without any prompting the participants felt the need to pay attention to the teacher and to decipher a note other students were passing. Based only on the environment and their position within it, participants took on the social script of “student.”

Our second test, which explored how audience position affected interpretation of the story. Here we see an audience member in three different positions in a classroom scene with Taro. Photos: Alexandra Garcia

Similarly, in the first “D&D”- style test, when participants’ avatars were placed at the front of the classroom, they either took on the role of “lecturer” or expressed anxiety, as if they were actually standing in the front of the class. One participant was so uncomfortable she asked that her avatar be moved: “Can I stand in the back against the wall?”

In the “D&D” prototype, a blank figurine was used to represent the embodiment of the audience . Here, the audience stands at the front of a classroom scene to amplify the emotional distance from Taro (in the back). Being at the front prompted questions like “Can the students see me?”… and “Can I stand in the back against the wall?” Photo: Karin Soukup
The “D&D” test borrowed mechanics and the role of the “dungeon-master” from the game Dungeons and Dragons. We acted as selective narrators — hiding or revealing real-time story information to emulate some of the constraints of VR. Photos: Karin Soukup

In the experiment set in Taro’s bedroom, participants stood in the middle of the room, without the divider, replicating the position of a typical 360°camera. Unlike the classroom, which has strong behaviors associated with it, standing in the middle of a stranger’s bedroom prompts few appropriate social scripts. However, multiple participants ascribed themselves roles, describing themselves as a “voyeur” or like a “fly on the wall.” Others felt vulnerable and ill-at-ease standing in the middle of the room.

We suspect this is partially attributed to not knowing how to act in this setting, a setting that is typically private, and partly to feeling physically exposed in the middle of the room. To feel bodily present, these tests suggest, the audience should understand who they are in the scene (even if who they are is a “fly on the wall”) as much as where they are.

Consider how the audience will respond to the physical and social dimensions of the space. Ask yourself, “How can we give the audience enough context to feel comfortable being present in this environment?”

For better or worse, the audience directs their own gaze.

Because it is impossible for humans to see in 360°, they must actively choose what to look at and when. Looking gives the audience agency, not to change or affect the story in VR, but to choose which pieces of the story they take in, make meaning out of and combine with other information to form a story in their minds. In this way, no two individuals experience the exact same story, because no two individuals look at the exact same things in the exact same order.

When the scene has more than one focal point, the audience is forced to make a choice about where to put their attention. In our tests, some audiences expressed FOMO (what the kids are calling “fear of missing out”). For example, in the classroom scene, we observed participants working really hard to read a note some of the students were passing. Participants were so curious about the note that they brought it up repeatedly in the debrief, “I still want to know what’s in that note!” FOMO could definitely distract, taking the audience out of the experience, but may also be a storytelling tool to create suspense or illicit curiosity. Remember when Bill Murray whispered in Scarlett Johansson’s ear at the end of Lost In Translation? Sometimes, not knowing is a powerful thing.

“Lost in Translation”, Photo: Focus Features

When thinking about how the story unfolds, ask yourself, “How can you draw attention to the most important story points?” And “Can you use FOMO to your advantage?”

The more there is to see, the less the audience remembers.

In our third test, audiences with a 90° range of vision could recall nearly every event in the story, whether the information was physically in the room or relayed through the audio. However, audiences in the 360° view recalled fewer details of the story and the environment. For example, in the 90°scene, all of the participants in the debriefing referred to Taro by name. In the 180° scene, Taro was sometimes referred to by name, but was more often given descriptors like “young man.” By the 360° scene, few remembered Taro’s name, instead they referred to him offhandedly as “the kid at the computer.”

Much of the story information, including character names, was delivered through the audio. The fact that participants in the 360° scene couldn’t remember Taro’s name (among other story details), suggests that they were focusing less on the audio in 360° than in the 180° or 90° scenes. Perhaps there was too much information in 360° for the audience to process. When telling a story in 360°, we need to consider how to combine audio and visual elements without overloading the audience.


The more complete the environment, the more it resonates.

Audiences in the 360° scene were more aware of the tone of the piece, which they attributed to the pacing and shifts in the lighting. They were so attuned to the tone that when asked who was in control of the story, they described the storyteller as the mise-en-scene itself, or used some abstraction, like the storyteller was the “rhythm” of the scene.

Audiences in the 360° scene were also more attuned with Taro’s feelings. They could clearly and unequivocally identify that Taro was feeling “lonely,” and sometimes felt that Taro’s feelings were reflected in the mise-en-scene itself. Whereas those in the 90° and 180° scenes really struggled to characterize Taro, claiming that they did not have enough information to draw conclusions about him.

There’s something interesting happening here. It may be that when you feel present in an experience, you are more likely to rely on abstractions and pick up on feelings, and when you are in “detective mode” you are more likely to pick up on story details, but have difficulty accessing feelings. Perhaps being present and retaining story details are fundamentally at odds.

With each new bit of information you add to the VR storytelling experience, you should ask yourself, “Does this information lend to feeling present, or will it send the audience into their heads — and which mode do I want them in right now?”

Alexandra Garcia demonstarates the 360° live prototype state, that explored the tension between two focal points and the audience’s sense of “missing out” on story information. Photo: Karin Soukup

The Storyteller as Matador

Initially, we believed VR technology would usher in a new role for the audience, moving them from simple “observer” to the more active state of “influencer” (having impact on the story, but not changing the outcome of the narrative). Instead, we discovered that observing is an active state. Looking is doing, and it requires a lot of work from the audience.

It’s actually not the audience that feels the need to influence the story — they have enough to “do.” Instead, the storyteller needs to shift how they think of themselves, moving away from “director” and towards the role of “influencer.” After all, influencing the audience is all that directors can do: we can’t frame the shot for them; we can’t cut away. Instead, storytellers have to behave like a matador, waving the red cape in the direction they want the audience to run, knowing that the power ultimately lies in the audiences hands to see what they want to see, hear what they want to hear and form their own stories about what they have experienced.

We can borrow techniques from other media — from theatre, art, film and design — to draw the audience’s focus. But in order to choose whether to show a color, break the 4th wall, etc., we have to first put ourselves in the audience’s shoes and understand their cognitive, emotional and physical experience. We need to embrace human-centered design lens of “audience experience,” and let that guide our choices.

At its most basic level, good AX design means mastering this foundational interaction first. Because, if we do our job well, if we can influence the audience’s choice of where to focus without overburdening them with that choice. Then the audience will feel as though they are living inside the moments we create. And as storytellers, we may become as invisible to the audience as the technology itself.

Thank You

The extended team: Paisley Smith, Judeth Oden Choi, Alexandra Garcia, Amy Santamaria, Joseph Lim, Maya Hawke and Emily Goligoski. As well as the test participants & actors.

Our partners: Tran Ha and Stanford’s Media Experiments; Vincent McCurley, Milan Koerner-Safrata, Paisley Smith, Janine Steele, Loc Dao, from NFB Interactive.

The experts we consulted: Clint Beharry, Sun Joo Ahn (Grace), Ayana Baraka, Brendon Trombley, Saschka Unseld, Ida Benedetto, Chloe Varelidi, Amanda Ramos, Katie Flemming, Ebba Petren, David Dennis, Dr. Jon Freeman, Davey Wreden, Susan O’Connor, Cory Howard, Chloe Varelidi, Pam Maples, Zena Barakat, Debra Anderson, Phuong Ly, Amanda R. Welsh and IDEO NY.

We are grateful to the “being there” folks that we interviewed during the discovery phase of the research: the lucid dreamer, the mom who just gave birth, the kidnapping survivor, the extreme sports photographer, the 2x war photographers, the fighter pilot, the critical care nurse and the Canadian wilderness guide.

Our overly accommodating workspace: Friends work Here (Thanks Tina & Hayley).

And, finally Thanks: Neil Nisbet, Sean Connelley and Tiny-boss.


Learning shared by the Stanford community


Learning shared by the Stanford community

VR/AR Media Experiments

Written by

Katy Newton is a filmmaker and experience designer; Karin Soukup is an experience designer and creative director. Find them at @katywnewton + @designcurio


Learning shared by the Stanford community