Mental models for AR: Concept, Technology, and Product

Ways to think abour AR and a 3-part model for technological change

Published in

Jovono

13 min readMar 2, 2019

AR is going to be really big. At least, that’s what Tim Cook thinks (or says to shareholders). Is he right? It’s clear that AR is developing at a good clip and that technologists should think about it. In the meantime, I don’t plan on being bug eye Shaq anytime soon, so Magic Leap’s vision is just going to have to wait.

I do think AR is going to be super important, and have since I first heard about VR and Oculus (just before it was bought by Facebook). But how can I even project that with any kind of confidence? To think about a technology that’s so early you need a good mental model. The mental model helps you organize your thoughts, project forward, and consider how big something can be.

So, what’s our mental model for AR? Here are three models you can use: concept, technology, and product.

AR as Concept

First, let’s think about AR as a concept. The point of this exercise is to be abstract and think about what exactly AR is trying to accomplish.

Think about Disney’s Haunted Mansion ride. The ride is a typical ghost train, where you’re in a car and taken by some spooky things. What makes this ride particularly interesting is the way the ghosts interact with the environment. The sets are entirely physical, from the furniture to the cobwebs. The ghosts, however, seem almost animated. It’s just fancy lighting with incredible Imagineered robotics, but it feels like you’re in a living animation cel. Nowadays you might forget that you aren’t wearing AR glasses.

Where does the ride end and reality begin?

While Haunted Mansion is not an AR experience (though it totally could be), it is a good example of what I think AR is about: the breakdown of the barrier between the physical and digital worlds. After all, what even is AR? It’s the overlay of a digital experience over a physical one, but it isn’t just that. Otherwise, you could just impose any image over a live video feed and call it a day. The idea of AR is that the digital denizens we import to the physical world have some notion of object permanence, know where they are, have a sense of place and space and time. They interact with the environment they’re in, and eventually might be able to “move” objects or adjust their appearance to the light source and white balance of the room they’re in. One of my favorite early examples from before anyone even knew about AR is a translation app called WordLens, which not only translated signs but tried to impose the translated text on a live feed of the original sign. They even tried to replicate the original font! It was super cool, even though it didn’t work fantastically.

An AR app from before I remember anyone using the term.

The digital and physical worlds are already being brought closer together. Today, the news we read online directly impacts decisions we make offline. We carry little oracles in our pockets that play videos while we’re on the subway. Hell, my phone is now my thermostat! The barrier between physical and digital is already broken — the floodgates are open, but so far the internet has mostly pulled us in. AR is a further blurring of the lines (and viscerally so, since it fools our strongest sense) but it’s a way to reverse the historical course by bringing digital experience out of their domain and into ours. In a sufficiently good AR experience, it should be difficult to differentiate between the physical and the virtual. This doesn’t require graphics so good that digital wood, for instance, looks like real wood. It just requires that the digital objects or beings are so constant and integrated with their surroundings that they feel natural by virtue of prolonged exposure and an absence of behavioral aberrations.

Where would VR fit into this? People usually think of AR and VR as competitors. I used to think that way, but now I tend to think of AR and VR as cousins, different sides of the same concept. They simply approach it from a different angle. VR is about bringing the physical world into the digital one. It’s an immersive experience where you’re enveloped in a virtual world and the only baggage allowed is yourself and anything else future headsets will let you bring with you. AR, on the other hand, uses the physical world as a canvas, bringing digital goods and beings into it with varying degrees of realism. The all-encompassing experience of VR makes it something really special, even for storytelling, which Jon Favreau makes the case for in a very-worth-watching TED talk. Your senses are totally overtaken even if you haven’t forgotten what you’re doing, as I learned when I played a rock climbing VR game and felt my stomach lurch when I “fell”. But that experience of pulling you in feels like a sometimes-indulgence or power tool, not something I could use casually or that would cover most of the use cases I’d want. For that, I’d want AR.

The difference between AR and VR. h/t Chris Gallello

AR as Technology

Another way to think about AR is as a technology. That is, a collection of tools and inventions that are thematically or technically related in some way.

I’d think of AR as a more general platform on which some interesting apps are built — in this case, presumably some kind of headset technology encompassing machine vision, cameras, and high-quality displays. (Or, perhaps, AR is the OS on some other platform).

I’m intrigued by a parallel in self-driving. Let me explain.

Some of the earliest self-driving technologies are actually quite old. Consider, for example, self-parking. Tesla is fond of teasing self-parking features as its first “true” self-driving experience, but self-parking was first introduced in 2003 as Toyota’s Intelligent Parking Assist feature — news reports at the time listed the feature alongside other snoozers like gas mileage. If we’re getting even more basic, think about cruise control. While some notion of Cruise Control existed circa 1910, the first real version was invented in 1948. This might seem like a stretch, but adaptive cruise control is now thought of as a canonical Level 1 feature.

In 2003, this was the pinnacle of self-driving cars!

What’s the point of all this? Well, no one thought about these technologies as self-driving at the time. The first cruise control patent was for a “constant speed regulator” rather than a self-driving car. And remember that the first self-parking feature was described as an “assist” because, presumably, the assumption was that cars would always have drivers. Even the idea of “levels” of autonomy didn’t exist until 2014, when the Society of Automotive Engineers published their AV taxonomy, J3016A. Why then? Once we got serious about autonomous vehicles we tried to orient ourselves, which meant creating context. Hence, we came to realize that things like cruise control were actually elementary steps in the evolution of AVs.

For those unfamiliar, here are the levels of self-driving cars:

Driver assistance technologies, where the automation enhances the human driving experience.
Partial automation, where specific tasks are automated but human control is usually required and oversight is always necessary, like Tesla Autopilot.
Conditional automation, where humans can disengage for non-critical activities but still must pay attention.
High automation, where cars can basically run the show but might not work in certain environments or conditions.
Complete automation, where you don’t even need a pedal or steering wheel.

What I rather like about the levels of autonomy schema is that there are a lot of models for technological change (though they aren’t exclusive and, indeed, subject to perspective). New technologies can be obsoletive, ecosystem building, sui generis, cyclical, or one of many other development models. Some technologies, though, are evolutionary because there is a clear end-goal and logical technical steps that must be completed in a predictable sequence. Self-driving cars are a really, really great example of that. However AVs progress — and progress they will — they’ll develop along a clear evolutionary path. I think this fits AR pretty well, unlike, say, the smartphone market, as there’s a pretty clear path to what we want AR to do.

So what might those levels be? I think it makes most sense to tackle this from a software perspective, as the ideal form factor is yet to be determined and I don’t think, say, expanding the field of view is a categorical change (technically challenging as it may be). Here is my attempt to take a crack at it:

0. Overlay: just a dumb overlay over a live video file of the physical world.

1. Permanence: objects have a consistent location, size, and perspective. The digital objects are 3d.

2. Place: objects have some notion of where they are in the environment, especially with depth.

3. Characteristics: objects have consistent characteristics that transfer across the interaction space. So a needle can be sharp and a heavy anvil can have more than a falling animation.

4. Awareness: Digital entities know enough about the world to incorporate into their behaviors in a convincing way.

5. Disruptiveness: An AR system can convincingly edit or interact with the physical world in a way the user can perceive.

One subtle point: the amount the AR system needs to know about the physical world probably increases over time. At level 1, the AR system just needs to know the shape of some surfaces. Place requires depth perception, and that’s tricky, since it requires knowledge of objects and their physical relationship to each other. At level 2 we’d start to see a real interaction with the environment, and can even recognize where they are with respect to lighting. Level 3 requires knowledge not only of digital objects but physical ones (note that I don’t distinguish between the two in my definition). And level 4 is where the physical and digital world basically melt away. At that point arguably it’s just an AI problem. Another way to think about this is an evolution of the AR system’s understanding of objects (space -> multiple other objects -> objects themselves -> internalized planning). Also note that VR doesn’t appear anywhere here; as I noted in the previous section, I don’t think of VR as just level 0 AR but as a different technology in the pursuit of similar, but different, ends.

Got a better idea of what the levels should be? Send it to me!

AR as Product

So we’ve thought a lot about AR as a concept and as a technology. But that’s not what people buy or what they use. Ultimately, what people touch is a product. How should developers think about AR as a product?

It’s the future we need, but is it the future we deserve?

One early area we’re seeing some cool experiences is tours. I’ve been pretty skeptical of the idea of VR place experiences, like where you put on an Oculus or something at a mall. I’ve tried them all over the world, even at the Lafayette Mall during Christmas, which should rock(!) but then they just suck. AR, on the other hand, has been pretty enjoyable every time I’ve tried it as a place enhancer. People have made some pretty cool examples of AR, including a level 0 AR experience in Gaudi’s Casa Batlló which I personally found really fun — seeing people go around every room twice was quite an experience. This makes sense, since the point of going somewhere is to see it; digital-physical world blending should be an enhancement, not an escape, at destinations. Level 1 and level 2 AR experiences are about to become super commonplace due to the fact that every major platform has a free SDK that arguably makes creating level 2 AR experiences easier than many level 0 experiences, which are still somewhat handrolled.

Another early place we’re seeing some innovation is more interactive experiences, like tools and especially games. The early demos for Magic Leap are basically all games. In fact, many have called Pokemon Go the first popular AR experience (I agree). 2017 was basically the Year of a Thousand Measuring Apps on the App Store, which Apple then unceremoniously killed (proving that…tape measure is the new flashlight? does that sound good?) by debuting its own system-level app. Very cool, Apple. But there are other, lesser known examples. The first real AR game was really Ingress, which was Pokemon Go’s predecessor; Line, a fun Google app, is a nice example of AR’s ability to make art; Wonderscope is simply breathtaking (but the ad for it has a kid using an iPad, which kills me almost as badly as when I see tourists taking pictures with their iPads at the Met); and Housecraft shows how AR could change shopping as we know it (and left me wondering how long until people have purely digital decorative furniture once level 2 is perfected). I think AR will be awesome for art in particular, and it’s heartening to see that Brian Eno, who wowed us when the App Store first opened with his app Bloom, has created an AR successor. It makes me excited for tools like Asteroid Zone, which are still very much in their early days, as sustenance for creators.

Some of the most fun examples may be surprising. Adam Silver, the NBA commissioner, said at a talk at Recode that Magic Leap will let you see an NBA game projected on your table. This was somewhat surprising the first time I heard it, but the more I thought about it the more I liked it. One’s first thought is, why bother with a contrived tabletop visualization when you can already see basketball live on your TV? But then I remembered that my favorite AR app to date is App in the Air, which uses ARKit to make a globe with lines for every place you’ve travelled. It’s pretty trivial (petty?) but it’s also a really fun visualization if you travel a lot. I could totally see tabletop Lakers being just as awesome. Suddenly, the real-life tabletop integration of spectator sports seemed compelling. This could be really fun once there’s a level 3 AR experience, so the basketball knows that it’s bouncy for example and can dribble across the room.

App In the Air. Trust me, it’s actually cool

I’ve managed to avoid talking about form factor throughout this blog post, but product is where it becomes really important. I suspect that we can get by with phones much longer than most people think. Twitter or Facebook or whatever has gotten us very used to walking around with our phones like drones delivering Amazon packages. We’ve all walked around with our phone using Google Maps to find where we’re going. Why not occasionally hold it up to follow an arrow? (Though personally I find wrist-based, continuous, passive navigation à la Apple Watch to be more pleasant.) There are many tools that are situational or amenable to being seen through a single view. So it isn’t difficult to imagine that people will use the extremely powerful little viewports they already carry around for many of the first AR applications.

A direct field of vision device, whether that’s a headset or a contact lens or some apocalyptic nightmare brain implant, will eventually be needed for AR to realize its full potential. The challenge for AR is that many of the most compelling applications for it are casual and unexpected, so you won’t be able to get people to wear clunky headsets at home like you get with VR. As a result, phones will be key for a while, since we already always have them with us. For some applications, this will be prohibitive; for others, it will be a local maximum. For example, anyone who’s played Pokemon Go for a long period of time will tell you that you don’t enjoy holding your phone but rather come to tolerate it in service of the rest of the experience (though based on usage numbers, tolerate it they do). We’re willing to hold up our phones for proscribed level 0 to level 2 experiences because they’re limited in time, but I imagine that once we hit the kinds of games or work apps — like an AR guide for shippers, for instance — we’ll find non-field of vision devices tedious at best or impractical at worst. For many people, their first on-head AR experience may be a non-AR experience entirely, like Intel’s Vaunt glasses (which are a mostly glorified notification device, clever though they may be). But for now this is as far as I’m willing to speculate.

What’s interesting about the earliest use cases for AR is that they seem…quite big. Entertainment, the main market for VR so far, is a big industry, but it’s not quite as big as you think. Movies make $40b, and video games make about 2.5x that at around $110b. Travel, in contrast, is way larger, as tourism is a trillion dollar industry in the US alone. AR tends to touch these huge markets, like travel experiences, commerce, and transportation. As a product, even the early experiences make you wonder why everything isn’t like it. I’m reminded that what made phones the biggest computer market ever is their omnipresence and their greater ease of use in casual situations, and so I wonder what that means for AR. I certainly don’t think VR will be 1000x the size of AR (that’s recency bias). Pokemon Go, the first AR game and a rudimentary one at that (it’s basically level 0), has been played by more people than have ever probably even used VR once.

Wrap-up

So. That’s it.

I hope this helped you think about AR as well as your work generally, as I find these mental models very instructive. It’s clear, I think, that there’s a lot of work left to do on the fundamental aspects of AR (and you don’t need to be employed by Apple or Google to do good work, so get on it). It’s also clear that the tools that currently exist are enough to make some interesting products and maybe even a business or two.

And that Shaq totally got played by Magic Leap.