Why AR Apps will struggle for engagement without the AR Cloud
AR-Native apps need real-time connections to People, Places & Things
If you were asked what is the single most valuable asset in the tech industry today, you’d probably answer that it’s Google’s search index, or Facebook’s social graph or maybe Amazon’s supply chain system. I believe in 15 years time there’ll be another asset at least as valuable as these, that doesn’t exist today. Probably more valuable, if you look at what Microsoft’s Windows O/S asset (easily the most valuable tech asset in the 1990s) is worth in 2017 vs 1997.
Will one company eventually own (a huge profitable part of) it? History says probably. Will it be a new company? Also probably. Just as it was hard to imagine Microsoft losing its position in 1997, its hard to imagine in 2017 Google or Facebook losing their position. But nothing is guaranteed. I’ll try to lay out the arguments supporting each of three sides playing here (incumbents, startups, open web) in the last post of this series.
My last couple of posts talked about how ARKit and ARCore work. Basically discussing what’s available today and how did we get here. This post (well, series of posts) is going to get into what’s missing from ARKit and ARCore, and how those missing pieces will work.
So just what is this ARCloud?
To get beyond ARKit and ARCore we need to start thinking bigger than ourselves. How do other people on other types of AR devices join us & communicate with us in AR? How do our apps work in areas bigger than our living room? How do our apps understand & interact with the world? How can we leave content for other people to find & use? To deliver these capabilities we need cloud based software infrastructure for AR. I’ve been hearing people (including my SV partner Ori Inbar) refer to all this stuff as the ARCloud, and I like that name.
The ARCloud can be thought of as a machine-readable 1:1 scale model of the real world. Our AR devices are the real-time interface to this parallel virtual world which is perfectly overlaid onto the physical world.
Why all the “meh” from the press for ARKit & ARCore ?
When ARKit was announced at WWDC this year Apple Chief Executive Tim Cook touted augmented reality, telling analysts: “This is one of those huge things that we’ll look back at and marvel on the start of it.”
A few months went by. Developers worked hard on the next big thing, but the reaction to ARKit at the iPhone launch keynote was “meh”. Why was that?
It’s because ARKit & ARCore are currently at version 1.0. They only give developers three very simple AR tools:
- the phone’s 6DoF pose, with new coordinates each session;
- a partial & small ground plane;
- a simple average of the scene lighting.
In our excitement over seeing one of the hardest tech problems solved (robust 6DoF pose from a solid VIO system) and Tim Cook saying the words “augmented” and “reality” together on stage, we overlooked that you really can’t build anything too impressive with just those 3 tools. Their biggest problem is people expecting amazing apps before the full set of tools to build them existed. Note that it’s not the if but the when that we’ve gotten wrong.
What’s missing, to make a great AR app?
Clay Bavor referred to the missing pieces of the AR ecosystem as connective tissue which I think is a great metaphor. In my post on AR product design I highlighted that the only reason for any AR app to exist (vs a regular smartphone app) is if it has some interaction or connection with the real world. With physical people, places or things.
For an AR app to truly connect to the world, there are three things it has to be able to do. Without this connection, it can never really be AR-Native. These capabilities are only possible with the support of the ARCloud:
How do people connect through AR?
How do we support multiple users sharing an experience? How do we see the same virtual stuff at the same time, no matter what device we hold or wear, when we are in the same place (or not). You can choose a familiar term to describe this capability based on what you already know e.g. “multi-player” apps for gamers, or “social” apps or “communicating” apps. It’s all the same infrastructure under the hood and built on the same enabling technology. Really robust localization, streaming of the 6dof pose & system state, 3D mesh stitching & crowdsourced mesh updating are all tech problems to be solved here. Don’t forget the application level challenges like access rights, authentication etc (though they are mostly engineering problems now).
How do AR apps connect to the world & know where they really are?
GPS just isn’t a good enough solution, even the forthcoming GPS that’s accurate to 1 foot. I’ll explain why in the future post on this topic. How do we get AR to work outside in large areas? How do we determine our location both in absolute coordinates (Lat/Long) & also relative to existing structures to sub-pixel precision? How do we do achieve this both indoors & out? How do we ensure content stays where its put, even up days or years later? How do we manage so much data? Localizing against absolute coordinates is the really hard tech problem to solve here.
How do AR apps understand and connect to things in the real world?
How do our apps understand both the 3D structure or geometry of the world (the shape of things) e.g. that’s a big cube-like structure my Pokemon can hide behind or bounce into, and identify what those things actually are e.g. the blob is actually a couch & my virtual cat should stay off couches. Real-time on device dense 3D reconstruction, real-time 3D scene segmentation, 3D object classification (Don’t worry, I’ll explain what all these terms mean in the post on this subject), backfilling local processing with cloud trained models are the challenges here.
Like much in AR, it’s not that hard to build something that demoes well, but it’s very hard to build something that works well in real world conditions.
I’d hoped to get all this out into one post… but it would have been an epic, even in relation to my other posts. So I’ll do one post on each of the above 3 points. What I hope to achieve is to communicate both how important and how difficult it is to build this infrastructure to deliver a consumer grade AR UX.
You will probably hear about the ARCloud a lot in coming months: If you’re confused, it’s not you, it’s them
Just when you thought you were getting your head around the difference between AR, VR and MR, it all goes another level deeper! Vendors will use identical terms that mean completely different things, like:
- “Multiplayer AR” could refer to a purely game-level way of tracking what each player does in the game itself with zero computer vision or spatial awareness. Or it could refer to a way to solve some very hard computer vision localization problems. Or both of the above. Or they may mean something else entirely.
- “Outdoors AR” might just mean an ARKit app that has large content assets that look best outside, or it could mean something verging on a global autonomous vehicle 3D mapping system.
- “Recognition” might mean manually configuring a single marker/image that your app can recognize, or it might mean a real-time general-purpose machine-learning powered global 3D object classification engine…
Is today’s cloud up to the job?
When I worked in telecom infrastructure, there was a little zen-like truism that said “there is no cloud, it’s just someone else's computer”. We always ended up working with the copper pairs or fibre strands (or radio spectrum) that physically connected one computer to another, even across the world. It’s not magic, just difficult. What makes ARCloud infrastructure different from the cloud today, powering our web and mobile apps, is that AR (like self-driving cars & drones & robots) is a real-time system. Anyone who has worked in telecom (or on fast-twitch MMO game infrastructure) deeply understands that real-time infrastructure and asynchronous infrastructure are two entirely different beasts.
So while many parts of the ARCloud will involve hosting big data and serving web APIs and training machine learning models, just like today’s cloud, there will need to be a very big rethink of how do we support real-time applications and AR interactions at massive scale. Basic AR use-cases like: streaming live 3D models of our room while we “AR Skype”; updating the data & applications connected to things, presented as I go by on public transport; streaming (rich graphical) data to me that changes depending on where my eyes are looking, or who walks near to me; maintaining & updating the real-time application state of every person & application in a large crowd at a concert. Without this type of UX, there’s no real point to AR. Lets just stick with smartphone apps. Supporting this for eventually billions of people will be a huge opportunity. 5G networks will play a big part & are designed for just these use-cases. If history is any guide, some if not most of today’s incumbents who have massive investments in the cloud infrastructure of today will not cannibalize those investments to adapt to this new world.
Is ARKit (or ARCore) useless without the ARCloud?
Ultimately it’s up to the users of AR apps to decide this. Useless was a provocative word choice. So far, 1 month in, based on early metrics, users are leaning towards “almost useless”. My personal belief is that useful apps can be built on today’s ARKit, but they will only be useful to some people, occasionally. They might be a fun novelty that makes you smile when you share it. Maybe if you are buying a couch you’ll try it in advance. But these aren’t the essential daily-use apps that define a new platform. For that we need AR-Native apps. Apps that are truly connected to the real-world. And to connect our AR-Apps to each other & the world, we need the infrastructure in place to do that. We need the ARCloud.
Next: AR and connecting people (coming soon)