Listen to this story
Lots of people seem excited about something called AR, or “Augmented Reality,” lately. There’s tons of cool demos floating around on Twitter. Facebook launched their Camera Effects Platform. Apple has an Augmented Reality developer kit and demoed a flashy AR game on stage at their iPhone X event. Google launched ARcore and a suite of experiments. Magic Leap is still secretly chugging along, releasing demo-like teasers. A dancing hot-dog is apparently “the world’s first augmented reality superstar.”
Mostly for my own sanity, I’ll be using “AR” or “Augmented Reality” from here on out to mean Things You Can Do Via Your Smartphone Camera Other Than Take Photos or Videos.
Most of the cool AR demos we see today appear to superimpose objects, characters, filters, or other effects to the world around us. They do this via (mostly) drawing things onto a smartphone camera feed, or in some cases, a headset (hololens, magic leap).
These “superimposing demos” are super rad and very shareable because they show you something impossible. In “real life,” you can’t see mario kart driving down the street, or a breakdancing hot dog on your desk.
This type of Augmented Reality, which you might call it “Additive AR”, is entrenched in the photographic & cinematic tradition — not only because the way most of us consume AR is by watching videos of recorded demos, but also because we still regard the camera the same way we did a century ago. We still think of the camera as a way of recording media. First cameras took photos by exposing chemicals to light; then film; and then they recorded videos; then digital photos and videos, that keep getting better and better. Their function was to record the world around them.
Additive AR is the logical next step in a lineage of enhancing our photos & videos. This lineage goes back to colored lens filters, practical effects, polarizers, subtitling, digital CGI, and more recently, snapchat filters. It basically compresses the entire CGI pipeline into something that occurs in real-time, at the tap of your finger. Your phone is doing all the AR stuff, from your perspective, through the camera. This is one reason additive AR seems intuitive; it’s the shortest possible feedback loop. Camera sees stuff, adds stuff to what it sees, and then shows it to you.
In addition to this, the percentage of all media that’s either photo or video has risen dramatically over the past decade. The culprits are platforms who make hosting and consuming photo/video content insanely easy: Facebook, Instagram, Snapchat, YouTube, etc. They’re all enabled by the fact that screens and cameras are continually cheaper to produce. Unsurprisingly, people are taking more photos year after year, and “pivoting to video” is something that multiple media executives are convinced is a good idea.
All of this is to say — it’s a pretty safe bet that you will be taking photos and videos for the foreseeable future. Adding AR filters, characters, stickers, etc to all of our existing behaviors (Instagram filters, snapchat stickers, etc) is the most frictionless way to get your AR content seen and used. New ways to make photos & videos cooler are the clearest and closest use case for AR. (Plus, this approach plugs right into the platforms we already use.)
I have a hunch that there is a less exciting, more boring, but actually-might-be-useful-in-day-to-day-life, application of Augmented Reality. Instead of AR just being the next step in the tradition of photography, we might think of AR as the next step in computation platforms. AR could be the fastest way to bring the tradition and practice of computation to the world around us and the objects in it. (By computation here I refer to kinda how my dad might think about computation: something that lets you do nifty things with technology & the internet you couldn’t do without it, like share a photo across the ocean in a millisecond, or automatically correct your typos.)
Additive AR is possible now because cameras are finally smart(ish) enough to know what they’re looking at. At the most basic level, they’re looking at the floor, and know where walls and corners are. Other camera apps can also figure out what book, shoes or car they’re looking at, and it’s not hard to imagine a future where it knows most products, places, and even situations.
So, what happens if you skimp on the exciting, impossible additive AR stuff and focus just on the camera seeing things? Not “seeing a table and adding a minecraft map on top of it, just the seeing a table part.
You don’t need Augmented Reality to tell you a table is a table. But the cool thing about your camera knowing what’s a table might have little to do with the camera at all — the biggest opportunity might be that the table now has access to computation, and a network. Your camera sees “table” and your phone has an internet connection; ergo, that table now has an internet connection. Your camera just acts as a go-between for the app/internet complex and that table — or that book, or couch, or whatever. The photo or video you took of the table is just an artifact of the process. What you’re doing is closer to clicking on the table than taking a picture of it.
So what happens once a table, or book, or living room, has access to computation? Why is this exciting? What happens when you point a camera (or, in the future, who knows, a magic wand) at objects, or places, or situations? I’m not smart enough to predict it, but we could look at other categories of objects that, at one point, gained access to a network. When speakers got access to a network, they started talking to us, listening to us, and telling us when to leave for work. When cars got access to a network (uber/lyft,) an entire transportation industry was transformed in a few years. When houses gained access to a network (airbnb,) you could travel anywhere in the world and have a place to stay. When your friends got access to a network (facebook, instagram,) how we communicated radically changed overnight (teens don’t go outside anymore, right?).
When your table has access to a network, maybe when you point at it it orders dinner; maybe it compares itself to other tables; maybe it sells itself. Pointing your phone at an oven or a lamp gives you access to all possible actions and information you would want in that situation. Would this recipe fit in the oven? How much would I save if I switched those lightbulbs? Augmented Reality is the closest imaginable technology that lets us bring the useful (and also, totally un-useful, because otherwise, where’s the fun?) aspects of computation to the world around us.
There are a few people already experimenting with the “recognize objects — run computation — show you results” model. Brad Dwyer’s Magic Sudoku is an app that recognizes when you’re pointing at a sudoku puzzle, and solves it in context. Judith Amores and Anna Fusté’s Paper Cubes recognizes certain objects and uses them as interactive anchors for little people running around your camera.
There’s one big challenge in reaching the “recognize — compute — show you” model on a broad scale. For the most part, the way computer programs were built was that for every new category of thing that we could compute, (words/letters, songs, photos, friends) we manually built all the actions and information and things we could do with it. When you could use photographs on computers, we built photo-editing apps. When your phone knew where you were with GPS, we built mapping apps.
But if cameras keep getting better at recognizing things, we’ll be adding a new thing that we could act upon with computers radically faster. Is it possible to program an app for every category of object you could point a phone at? Even if it is, you’re not going to download a different app for every possible object you could point your phone at.
For AR to work as a broad computational (read: nifty and useful) platform, and not just a bunch of cool individual apps, there are a lot of pieces that need to line up. For the most part, context and discovery seems like the biggest hurdle to get over. You don’t want to go find different AR apps for every different context and object you’re in, because there are thousands of them, and you also don’t know what there is and isn’t a AR app for. What you want is the ability to point your phone at something, and receive contextual suggestions about what you’re pointing at, and things you can do in that situation. So what does your phone need to know to let you do that?
- Object-recognition. Your phone needs to know what it’s looking at. There have been massive strides on this the past few years, and big players are constantly trying to improve it.
- Global context. Your phone needs to know where you are in the world. (luckily, GPS has been around for a while.) You might want different info from the world in your hometown than you would on vacation.
- Local context. Your phone needs to know where in local space you are, so that local content appears consistently to everyone that interacts with it.
- Personal signals. My phone needs to know that I just moved apartments, so that’s why I’m in Target pointing my camera at a bunch of couches. Personal signals let your phone narrow your intent from “you could want anything possible” to “you probably are interested in the bike here, because you bike everywhere.”
(I also acknowledge that one device or company knowing all of those things together is a recipe for a train-wreck privacy dystopia. Let’s talk about that some other time.)
Barring the technical hurdles to get over, would AR as a computing platform even be useful? What is there to gain by allowing people to compute the world around them? I dunno. But for every category of thing to gain access to computation, the potential applications and connections are exponential.
It took us a good 5 years to realize having a gps combined with a camera in the same device makes for really, really good dating apps. If your phone knows the world around you, what are the applications we can’t imagine yet? What’s the tinder for coffee cups? What happens when you click on a fridge, or a fish? What happens when Augmented Reality just becomes yet another boring way I use a computer? I don’t know, but to me, it’s radically more exciting than adding dancing hot dogs to my photos.
Thanks to Prit Patel, Amit Pitaru, Heather Luipold, Ryan Harms and Teo Soares for amazing feedback.