What Is Virtual Reality?

Brenda Laurel, PhD

Virtual Reality is everywhere again, and that’s a problem. Almost immediately after the new trend began, people started shopping 360° immersive video as VR. It is not. “Surround” movies are marketed as VR. They are not. “VR Storytelling” is a misnomer; it is not structurally VR. “Second Life” is described as VR. It is not. When the term is appropriated, its meaning disintegrates. Last time around, the same effect spread out across media types that are not VR. There is no such thing as “desktop VR.” Application of the term “VR” to a CAVE experience is questionable. When we use the term just because it’s sexy, its meaning spreads like an oil slick over our media and dilutes it such a degree that we no longer know what it means — think “turbo.”

The first example of immersive VR that I saw was in 1987 at NASA Ames Research where Scott Fisher, my colleague from the Atari Sunnyvale Systems Laboratory, was working on VR environments for training future astronauts. In those days we saw lime-green vector graphics on a black background. But even without polygons, texturing, or sophisticated modeling, it was really VR: I felt myself to be immersed in a virtual world in which I could take action. The definitive goal of achieving sensory immersion in a virtual environment, including a sense of presence, relies on a combination of factors working in concert.

Let’s get clear about the form of Virtual Reality. Here are some of its core characteristics:

1. Complete surround environment. The range of view may vary according to the viewing device, but a participant must be able to turn around, look up and down and see a complete environment.

2. Affordances for depth perception and motion parallax. In normal vision, these are enabled by a variety of systems. One is the musculature around the eye and the deformation of the lens of the eye. From this information, the brain “calculates” the point of convergence in the views of both eyes — the “focal point.” Another is the perception of size and perceived motion of near vs. far images. The Teatro Olimpico in Vicenza, Italy (constructed 1550–1558) used “forced perspective” through diminishing the sizes of objects toward a “vanishing point” at the back of the scene; the actors could not walk upstage because they would become perceptually bigger.[1] Another trick used in Renaissance Theatre — ground rows — forced a sense of depth and motion parallax by varying the scale and movement properties of layers of scenery. In early animation, the same trick was used by Disney, for example, with the “virtual” ground rows in Fantasia. One can imagine how computationally generated stereopsis[2] enabling VR particpants to have comparatively stunning first-person binocular views of the environment.[3]
 
 If VR is to be more than a solo-participant medium, then stereopsis becomes more important. The view presented to each eye for each participant is different, enabling multiple participants to have first-person binocular views of the environment. Typically the difference between the images presented to each eye represents human interocular distance, but major scale effects may be achieved by increasing the virtual interocular distance. For example, one can become a giant in the landscape if the virtual interocular distance is, say, ten feet. By the way, the lack of depth perception and motion parallax for more than one person at a time one reason why one CAVE does not quite reach the threshold of Virtual Reality; however, networked CAVEs can accomplish a similar effect.[4]

3. Spatialized audio, not just stereo. Obviously, people move their heads around when they are engaged in a virtual world. The auditory cues must match the participant’s movements as well as the effects produced by the visual system. With spatialized audio, the sounds in the world are always coming from the correct place in the model and at the correct distance of the participant from the sound.[5] It was the inclusion of spatialized audio that gave me the greatest“aha!” moment in VR.[6]

4. Affordances for tracking the participant’s direction of motion distinct from the direction of gaze. In early systems these were concatenated, decreasing the body’s freedom of movement. Eyes reveal gaze, and the pelvis almost never lies about direction of movement.

5. The participant’s sensorium as the camera. “Out of body” experience defeats the purpose of immersion. Enabling a participant to fly around like a bird is fine, but cutting to a different shot apart from the participant’s embodied point of view will certainly blow the illusion — in other words, VR is a first-person medium for every participant in the same world.

6. Natural gesture and movement. Early systems such as those at NASA employed sets of coded gestures that invoked various actions (including pulling down a virtual menu in VR, which made me giggle a bit). The more natural the action the greater the sense of presence. This is why game-controller UI is unacceptable for VR.

7. Affordances for narrative construction. Of the many uses to which VR may be put, explicit narrative storytelling is one of the least effective. By engaging in an immersive virtual world with various affordances and themes, a participant creates a story, or many stories, by traversals of the world. The author(s) of the world must design cues and affordances that encourage the participant to make dramatically interesting choices.[7]

8. The principle of action. A participant must have affordances for moving about in the scene (kinesthesia and proprioception). A participant must be able to take action in the world and perceive the effects. This is part of the larger sense of personal agency. If agency is to be robust, a designer cannot maintain a strict storyline, and 360° video is a non-starter.

All of these characteristics except the principle of action were well understood by the hardcore VR veterans in the 1980s and 1990s. In 1986 I wrote of the frequency, range and significance as elements of “first-person interaction”[8]. Based on his previous work at Xerox PARC, Rob Tow added to these criteria by formulating what he calls the “principle of action” for VR in 1994 in our article on Placeholder in the ACM Computer Graphics Quarterly.[9]

In his article “Virtual Second Thoughts” (Wall Street Journal, April 3, 2016), Wanger James Au (author of The Making of Second Life) complained that “[m]ost of the new VR technology, like the highly anticipated Oculus Rift that started shipping this week, requires a bulky and expensive headset that literally blinds a user to the outside world.”[10] Well, yes, VR headsets are supposed to blind a user to the outside world. That is because the intent of VR is to present an immersive “virtual” experience that convinces the participant through sensory and technical means that they are in fact in another “place.” Mr. Au seems confused about the definition of VR. If you want to see the outside world as you enjoy some of the qualities of VR, the proper medium is Augmented Reality — AR. — foreseen as early as the 1990s and mocked up as early as 2004 by my students at Art Center College of Design (one of whom is now directing content for the Microsoft Hololens)[11].

In his New York Times review of the Oculus Rift and the HTC Vive (April 5, 2016), author Brian X. Chen mentioned a key difference between the Oculus and the Vive — the Oculus uses a game controller for interaction while the Vive uses two “motion controllers.” That’s better — except for the little camera that lets you peek into the “real” world, thereby exploding the illusion of presence. Chen also bemoans the expense of these two systems and their accessories. He warns potential customers that the Cadillac price may be as high as $1500 for the whole setup.[12] In 1993, our setup cost about $1,000,000 (roughly 67,000%), which might explain why VR didn’t catch fire as a “consumer product” at the time.

We learned some stuff in the old days.

In 1993, Rachel Strickland and I designed Placeholder, a VR experience at the Banff Centre for the Arts, with additional support from Interval Research. Several Interval researchers were also members of the team; Rob Tow wrote most of the code and Michael Naimark invented and implemented some very cool landscape capture techniques. Our work was intended foremost as a design statement: VR could be used for things besides training. It might extend our imaginations or let us play and find delight in new ways. Placeholder was also an experiment, digging into issues of presence and representation. Our virtual environment consisted of three places connected by portals and it accommodated two participants simultaneously. It asked participants to take on the bodies and some of the sensory-motor characteristics of animals. The worlds had affordances for voice communication between participants with some vocal distortion to distinguish the various animals.[13]

During the experience of building Placeholder and offering the piece to the public, we learned some things that had not yet been articulated about how VR achieves the goal of sensory immersion. Here are a few of them.

1. Jump cuts don’t work. They induce shock, cognitive dissonance and sometimes nausea into VR experiences. We learned that when participants were traveling between places via portals, they needed a few seconds of darkness “in transit.” During those times, we noticed that participants regularly looked at the points of light that represented their hands. We also learned that fading sound from where one was leaving and fading up sound from where one was going had a strong positive effect on making transitions graceful.

People need more than one hand. The famous “Dataglove” invented by Tom Zimmerman only instrumented one hand for participation in virtual worlds. For a world such as ours, articulation of fingers were not important, but having both hands was essential to a feeling of total embodiment. Steve Saunders of Interval developed some simple and inexpensive hand trackers that really did the job.
 
 On the other hand, sometimes people don’t need hands at all. Char Davies immersive work, “Osmose” — while not VR as either of us understood the term, was produced the following year and exhibited at the Museum of Contemporary Art in Montreal, Canada in 1995. Instead of manual manipulation, Char employed breath and balance for allowing participants to float and breathe their way around many worlds. Breath was measured by a band around the immersant’s chest. As in diving, inhaling took you up and exhaling took you down. Head-tracking and balance were used in combination to influence both direction of movement and gaze. The principle of action was not achieved through manual manipulation but rather through the dense and native navigational qualities of breath and balance. True,1. the world was not changed by the immersant’s journey through it, but the participants were changed by the journey.[14]

2. The system needs to remember movements and actions. For example, when we decided how a participant embodied as Crow would move around, it turned out to be by wing-flaps.[15] Motion memory was needed to recognize them. We also needed memory of the participants’ movements to cleverly land the participant in the place they started flying. This may change with new systems that may be able to move the model to accommodate the landing. Our “walking distance” was locked to the magic circle of the Polyhemus sensing device, as the project had a physical as well as a virtual environment.

In Conclusion, this article is not to disparage any of the vibrant exploration of immersive media going on today. Great technical improvements are making great progress; e.g., frame rate, visual convergence, 3D modeling, and fine-tuned body tracking through video or other means. My intent is to describe in specific terms the formal and structural aspects of a particular form that was and is called Virtual Reality. I also want to warn younger folks of the consequences of stretching a name too thin. Back in the 1990s, that’s exactly what happened — and the form, along with the discoveries of those who created it — largely disappeared. Let’s be mindful of that this time around.

__________________________________________________________________

End Notes

[1] Something like the inverse of this method was used in the staging of the film versions of The Lord of the Rings trilogy.

[2] Computationally generated stereopsis emulates the physicality of vision by providing slightly differing views of the scene to each of a participant’s eyes. Usually, these views are derived from human interocular distance; however, one can become a giant in a place if the interocular distance is, say, 10 feet.

[3] The widely acknowledged inventor of VR, Ivan Sutherland (1967), famously lacked vision in one eye. Stereopsis would have done him no good.

[4] See, for example, work with KeckCAVEs Remote Collaboration projects at the UC Davis VR Lab, http://keckcaves.org.

[5] In days of yore, one of our friends taped Grateful Dead shows from two head-mounted microphones. Later, listening to the recording, we noted that every time he bent his head to eat or drink, the band would suddenly seem to travel up into the sky. No really.

[6] The technical source was something called the “Convolvotron”, developed by Scott Foster at Crystal River Engineering with help and inspiration from Bethe Wenzel. See http://interface.cipic.ucdavis.edu/sound/tutorial/hrtfsys.html, retrieved 4/18/16.

[7] Laurel, B. Computers as Theatre, Second Edition. Pearson, 2014. pp. 202–209.

[8] Laurel, B. “Interface as Mimesis.” In Norman, D.A. and Draper, S. W., User-Centered System Design, Lawrence Erlbaum & Associates, 1986, p. 79.

[9] Laurel, Strickland and Tow, “Placeholder: Landscape and Narrative in Virtual Environments.” ACM Computer Graphics Quarterly, Vol. 28, No. 2, May 1994, p. 124. Article is available online at http://tauzero.com/Brenda_Laurel/Placeholder/Placeholder.html.]

[10] http://www.wsj.com/articles/virtual-second-thoughts-1459722055. Retrieved May 1 2016.

[11] D. Scott Nazarian, 2004 graduate of the Media Design Program at Art Center College of Design, mocked up an awesome demonstration of how Augmented Reality would work by setting up a ring of gauze with installation of the stages of making a peanut butter sandwich arrayed outside of the circle. When a participant turned the next installation lit up. Scott is now Senior Creative Director at Microsoft Studios working on the Holens project. Matthew McBride (also 2004) designed a mock-up of an augmented reality mapping system on a transparent tablet as his thesis project. He is now Creative Director at Possible.

[12] http://www.nytimes.com/2016/04/06/technology/personaltech/virtual-reality-check-rating-the-htc-vive-and-the-oculus-rift.html?ref=technology&_r=0 Retrieved 5 April 2016.

[13] Video: https://vimeo.com/27344103. Also see Laurel, Strickland and Tow, “Placeholder: Landscape and Narrative in Virtual Environments.” ACM Computer Graphics Quarterly, Vol. 28, No. 2, May 1994. Available online at http://tauzero.com/Brenda_Laurel/Placeholder/Placeholder.html.

[14] http://immersence.com/osmose/. Retrieved 2/1/16. As an aside, Char’s Osmose team included John Harrison (software) and Dorota Blaszczak (sonic architecture and programming) — both contributors to Placeholder as well.

[15] I conducted much informal qualitative research on this point. Some people wanted “Superman fist” and others wanted to hydroplane. Finally, it occurred to me watching birds that everyone understands that many birds flap their wings to get around. Achievement unlocked.

Photo CC with Attribution Brenda Laurel 2016.

Brenda Laurel, PhD, is an independent scholar with over 40 year in higher education and computer games. She is author of Utopian Entrepreneur (2004) and Computers as Theatre, 2nd Ed. 2014.