Mixed Reality or Spatial Computing

Louis Rosenberg, PhD
Predict
Published in
10 min readFeb 16, 2024

The Language of Immersive Media — Past and Present

Virtual Reality as depicted by NASA (circa 1992)

Journalists keep asking me the same question these days: What’s the difference between Spatial Computing and Mixed Reality? As soon as I answer, it triggers even more questions about the language of immersive media and the various phrases in use today. To reduce confusion, I believe it’s helpful to review the history and perceptual meaning of core language.

Personally, I like Apple’s use of the phrase spatial computing and agree the Vision Pro is a spatial computer. Still, this has added to the confusion. After all, just about everyone calls the Vision Pro a 𝘮𝘪𝘹𝘦𝘥 𝘳𝘦𝘢𝘭𝘪𝘵𝘺 𝘩𝘦𝘢𝘥𝘴𝘦𝘵 except for Apple. Also, developers are not allowed to describe Vision Pro apps as enabling virtual reality, mixed reality, or augmented reality. I appreciate the marketing reasons for this, and suspect these language restrictions will fade over time — but it seems like overkill for Apple to ban certain words.

Maybe I’m just nostalgic, but when I first started working in the field, the phrase virtual reality was only a few years old and was already generating major excitement. I was a young researcher conducting VR experiments at NASA and the photo above was a poster in the lab where I was working. To me, it was a deeply inspiring image, capturing the full promise of the field.

Even more important, the human experience depicted in that NASA photo has been called virtual reality for almost 40 years. If you are a developer for the Vision Pro and you create a fully simulated immersive experience, is it really a sin to describe it as virtual reality? After all, the VR headset shown above is now in the Smithsonian. This is our history and culture.

Of course, the Vision Pro is far more sophisticated than the NASA headset above, not just because of its amazing fidelity but because it adds entirely new capabilities. The most significant capability is the power of the Vision Pro to seamlessly combine the real world with spatially projected virtual content to create a single unified experience — a single perceptual reality. This is called augmented reality or mixed reality depending on the capabilities and both phrases have a long history in academia and industry.

So, what’s the difference between AR and MR?

This is probably the most misunderstood issue in the world of immersive media, so it’s worth taking a trip back in time to explain how today’s divide came to be. In the early days, only one phrase was needed, Augmented Reality, but its definition was diluted in the 2010’s as marketeers pushed simpler and simpler systems under that banner. I suspect the pendulum will swing back in the future, but for now both phrases are helpful. To appreciate why, we need to dig into the perceptual requirements for augmenting a user’s reality in a convincing and authentic way.

As background, I began working on merging the real and the virtual in 1991 before we had language to describe such a combined experience. My early research at Stanford, NASA and the US Air Force aimed to explore the basic psychophysical requirements needed to create a unified perceptual reality of the physical and virtual. I called this new pursuit “design for perception” (not very catchy) and called the virtual objects “perceptual overlays” (also not catchy). This is how I described the core perceptual requirements for creating an authentic and believable mixed reality back then:

Perceptual Overlays: “I do not want to impose any restrictions on the nature or content of a perceptual overlay, other than to require that a natural and predictable relation exists between an operator’s neuromotor activities (efference) and the subsequent changes in the sensations included in the perceptual overlay (afference). This single restriction upon perceptual overlays is sufficient to encourage distal attribution to occur. As a result of distal attribution, the operator will accept the overlaid percepts as real and tangible parts of the ambient reality. In other words, the operator will achieve a sense of presence with respect to the overlaid perceptual information. Thus, the virtual percepts will be accepted as genuine features or properties of the environment in which the operator is working.” (Stanford, Rosenberg, 1993)

That excerpt is quite a mouthful, but the key phrase is “distal attribution.” It refers to the process in which your brain receives a new piece of perceptual information and integrates it into your reality as an authentic part of your surroundings. Thus, the guidance above states that to have a virtual object be perceived as a genuine part of your ambient reality, we need to enable interactive experiences. Passive viewing is not sufficient because it does not close the mental loop between afference and efference (i.e., action and reaction). This is why 360 degree VR movies filmed from a fixed location don’t quite seem like reality, even if you have depth perception and can turn your head. Without the ability to interact by moving or reaching, your brain categorizes it as a façade over your field of view, not your true reality.

Psychophysical Model for merging the Real and Virtual (link)

Identifying distal attribution as a core requirement for creating a unified mixed reality was a helpful insight, but I soon confronted additional perceptual requirements that had to be met to merge the real and virtual in a convincing way. These requirements can be summarized as (a) accurate spatial alignment in full 3D, (b) simultaneous interactivity with both the real and the virtual, and (c) bidirectional interactivity between the real and the virtual. Let me address each:

First and foremost, the real and the virtual need to be spatially aligned in 3D space with sufficient precision that the flaws are beyond the limits of human perception (called “Just Noticeable Difference” or JND in the field of psychophysics). Even subtle flaws destroy the illusion and your brain will perceive the real and virtual as separate percepts, not one reality. The two realms also need to be aligned in time, which sounds simple but with computer lag (especially in the old days) is often just as difficult.

Next, both realms need to be simultaneously interactive. This means, users need to be able to engage naturally with the real and the virtual at the same time, cementing the illusion that the digital content is an authentic part of the physical surroundings. It turns out that physical engagement (especially manual interaction) may be the most convincing of all. For example, if you reach out and grab a real object and move it naturally with respect to a virtual object, the illusion is firmly cemented. I suspect this is because our eyes are easy to fool, so our brains are more skeptical, but when you can manipulate both worlds manually and maintain perceptual consistency, your brain snaps the two worlds together as one.

And finally, the real and virtual need to engage each other (bi-directionally interactive), because without that consistency the illusion is compromised. If you grab a virtual book and place it on a real table and it passes through, it’s not perceived as a single reality and suspension of disbelief is lost. But if you maintain the illusion, your brain buys in and you find yourself in a single reality, a mixed reality that is deeply convincing — so convincing, you stop thinking about what’s real and what’s virtual. It’s just one reality.

The Virtual Fixtures platform developed at Wright Patterson Air Force Base (1991–1994) met these perceptual requirements for the first time, enabling users to experience a unified reality where virtual objects (called virtual fixtures) were perceived as authentic additions to the physical world. It was very crude by today’s standards, but because it enabled distal attribution, manual interactivity, bi-directional interactivity, and real-time feedback with 3D haptics and 3D audio, it created the first mixed reality experiences for users (i.e., a unified perceptual reality). The system is shown below during an experiment in 1993 in which spatially registered perceptual overlays were employed to help users perform dexterous real-world tasks:

Virtual Fixtures Platform — mixed reality research (USAF 1993, Rosenberg)

I bring up these perceptual requirements (distal attribution, 3D registration, simultaneous interactivity, and bi-directional interactivity) because they help explain why the Vision Pro is such a remarkable device. It’s the first product to achieve these capabilities with extreme precision, fostering a unified perceptual reality and maintaining the illusion for extended periods without flaws taking users out of the experience. The Air Force prototype shown above put many constraints on the user and could only maintain the illusion for specific tasks.

Now, back to language — how did we end up with two phrases augmented reality and mixed reality that refer to merging the real and virtual? When I shot the video above, no language yet existed to refer to this field of research. Fortunately, the phrase augmented reality soon emerged from Boeing which clearly reflected the goals — to add virtual content to a real environment that is so accurately aligned and naturally integrated, the two worlds merge together in your mind. And for two decades, that’s what AR meant (while simpler devices that merely embellished or annotated your field of view without distal attribution were called head-up displays).

Then in 2013 Google Glass happened. I deeply respect the product and believe it was ahead of its time. Unfortunately, the media incorrectly referred to it as augmented reality. It was not. It didn’t enable virtual content to be placed into the real world in a way that was immersive, spatially registered, or interactive. Instead, it was what we now call smart glasses, which are profoundly useful and will become even more useful as AI gets integrated into these products, but it wasn’t AR.

Still, the phrase augmented reality got watered down during the 2010’s, not only because of Google Glass but because the makers of smartphones were pushing simple visual overlays as “augmented reality,” even though they were not immersive and lacked genuine 3D registration with the real world. They also lacked user interactivity and bi-directional interactivity between the real and virtual. This was before LiDAR and other spatial mapping technologies were added to phones, enabling increasing levels of 3D spatial registration and interactivity. Today’s phones are much better, but the phrase AR got diluted.

As a result, when Microsoft launched the first commercial product (the HoloLens) that fostered a unified perceptual reality, they likely felt it was necessary to distance the device from the language of prior systems. I suspect this is why Microsoft, upon launching HoloLens, focused their marketing on the words mixed reality. The phrase had been around since the 1990’s, but it was with the HoloLens launch that the language really took off. It basically came to mean creating a genuine augmented reality, or as I would put it more rigorously — merging the real and the virtual into a unified perceptual reality.

And so, we now have two terms that describe different levels of augmenting a user’s surroundings with spatially registered virtual content. To clarify the difference between AR, MR, and VR, we can look at definitions that were published in 2022 by the U.S. Government Accountability Office (GAO). I assume the GAO cares about these differences to clarify if government contracts are paying for VR, AR, or MR devices. To address this, the GAO put out a public document that featured the simple image below. I like their framing because it gets at the key issue of interactivity which is so important to distal attribution:

VR vs MR vs AR as defined by U.S. GAO (GAO-22–105541)

It’s worth noting that the difference between AR and MR has nothing to do with the hardware and everything to do with the perceptual experience. I bring this up because many people incorrectly believe that AR hardware refers to glasses with transparent screens you can peer through, and MR hardware refers to headsets that use “passthrough cameras” to capture the real world and display it to the user on internal screens. This is not true. For example, passthrough cameras were used back in 1992 when developing the Virtual Fixtures platform. This design choice was made because it made it easier to register the real and virtual with higher precision, not because it changed the user experience. And besides, simple phone-based AR also uses cameras, so that is not the differentiator.

This leads me back to the Apple Vision Pro — it is a mixed reality headset, not because it uses passthrough cameras, but because it enables users to experience a unified perceptual reality of the real and virtual. And because mixed reality is the superset capability, the Vision Pro can also provide simpler augmented reality experiences and fully simulated virtual reality experiences. And for all three (VR, AR, and MR), the Vision Pro can amaze consumers with immersive experiences that far exceed any device built at any price. It’s a true innovation for Apple and an achievement for the team of engineers who made it happen.

Photo of Louis Rosenberg developing the mixed reality at Air Force Research Laboratory (1992) along side Rosenberg using mixed reality in 2022.
Left: Rosenberg researching mixed reality at Air Force Research Laboratory in 1992. Right: Rosenberg interacting with mixed reality thirty years later with the Meta Quest Pro in 2022.

The Vision Pro also enables new abilities that are entirely unique, including a spatial operating system (visionOS) that breaks new ground by using a user’s gaze direction for input. In other words, I agree that the Vision Pro is not only a mixed reality headset, but also a spatial computer and frankly, a work of art. I also believe that spatial computing is a great overarching term for AR, MR, and VR, along with other immersive experiences such as 3D movies and telepresence. My only recommendation is that all companies embrace the historic and accepted language of the field. Spatial computing is a useful term, but so is augmented reality, mixed reality, and virtual reality, all three of which are part of our history and culture.

Louis Rosenberg, PhD is a longtime researcher in the fields of virtual reality, augmented reality, and artificial intelligence. He is known for founding Immersion Corporation (IMMR: Nasdaq) in 1993 and Unanimous AI in 2014, and for developing mixed reality at Air Force Research Laboratory. His new book, Our Next Reality, is available for preorder from Hachette.

--

--

Louis Rosenberg, PhD
Predict
Writer for

Computer Scientist and Author. Founder of Unanimous AI. Founder of Immersion Corp. Founder of Outland Research. PhD Stanford. Over 300 patents for VR, AR, AI