The Road To Consumer Augmented Reality
With the trough of disillusionment well underway for Virtual Reality (VR), our industry has turned to Augmented Reality (AR) as the next great hope. AR is even more technically challenging to productize than VR, but offers potentially greater rewards. What is it going to take to build a robust, vibrant AR ecosystem on par with the App Store for iPhone? We need four components to create the platform for widespread AR adoption: Spatial Services, Artificial Intelligence (AI), User Interface (UI), and Hardware. With these four components together in the form of a Software Development Kit (SDK), developers can begin to build truly useful and compelling AR applications.
Truly useful AR requires the device to understand your context in the world; not just a two dimensional location on the surface of Earth such as Google Maps, but a full three dimensional understanding of your position and orientation, as well as the features and objects around you. This requires a detailed 3D map of interiors: homes, businesses, and public spaces. There is currently no available service-based dataset of interior 3d features that devices can use to localize against — Ori Inbar refers to this as the AR Cloud. Read his blog post for interesting technical details on how this might work.
While it is technically challenging to build a 3D Spatial Service, it’s clear what needs to happen to build this from a developer standpoint: platforms need to offer developers a 3D high resolution version of iPhone’s Location Services — a fusion of GPS, camera, and inertial measurement unit (IMU) data to determine your exact position and orientation anywhere in the world, inside or outside. You can think of the end result as similar to a massively multiplayer online role playing game, but instead of synchronizing a game world, we’re synchronizing 3D points in physical space (features) between users.
What is perhaps more challenging than building the technology for a planet-scale Spatial Service, is building the business strategy for generating and collecting this data set in the first place. Unlike Google Maps, where Google sent employees with mapping equipment into the world to map roads by driving them, a company can’t simply send employees into your home or your workplace to map the interior. While this might be possible for public interiors like malls and theme parks, it won’t work for privately owned businesses, much less private residences. We have to find compelling reasons to convince consumers and businesses to scan their interiors and store this data on somebody’s cloud service.
For consumers, the most likely way will be entertainment applications such as games. As you move through your space playing an AR game, you’ll map and refine the features of your home. Currently this is saved on your device, but it needs to be stored on a server to allow for an unlimited amount of mapped spaces, as well as synchronization between different AR devices in the same physical space. For businesses, it seems likely there will be a mix of marketing and advertising use cases geared at consumers in the space, as well as enterprise use cases within the business itself.
This is a great opportunity for startups as this problem is largely unsolved; the big tech companies are only just gearing up to solve this, and most importantly, the value proposition of why people should use AR is just starting to be explored via ARKit and ARCore. It’s only a matter of time until a startup creates a business advertising platform that uses ARKit/Core to make it easy for businesses to offer 3D/AR promotions, and in exchange, receives valuable interior mapping data. The next Pokemon Go viral sensation that can be played inside will generate massive amounts of interior residential data that will be incredibly valuable.
For AR to operate at its full potential, it’s not enough for devices and applications to know which place they are in and what general features are in the space. We have to have a more detailed understanding of the individual objects in the space, how they relate to each other, and how they relate to the user. This will be facilitated by Artificial Intelligence (AI) systems that excel at objection detection and recognition. The latest AI techniques have made it possible for machines to outperform humans in certain visual recognition tasks.
If I’m shopping in the grocery store, I may want to run a visual product search while I shop, to let me know if there are better prices available elsewhere in the store, in another store, or online. For this to work, my device has to quickly and reliably understand what the product I am looking at is. Amazon has made tremendous progress in this area.
A more general visual search is a highly ambitious problem that companies including Google, Pinterest, and Blippar are working on. This would allow the device to recognize all types of objects. It could recognize a city bus, and automatically perform a search against bus routes to see which bus this is and if it’s the bus you need to get to the next event in your calendar on time. It could recognize the door to your house, and unlock the door for you as you approach. This is the type of highly context-aware AR we’ve been imagining in books and movies for years now.
This is an opportunity for companies with expertise in machine learning that have access to, or can generate, a large visual training data set to build these sorts of recognizers. In addition, we will see more general AI that takes the inputs from an AR wearable and combines them with personal data (calendars, preferences) to determine your current context in real-time and push relevant data to you as you need it. This sort of on-demand virtual assistant will further add to the seamless and natural interaction with our digital worlds and selves.
The next key component for an AR platform is a next-generation user interface. In Natural Computing, I discussed how AR/VR technologies are about removing the artificial barriers and abstractions between us and the digital world, allowing us to interface with it in the same way we interface with the natural world.
The ideal AR interface does not require a controller, wand, or other mechanical input device. It should utilize the controllers we all instinctively know how to use: our hands. Apple pioneered skeumorphic design, which related 2D user interfaces to physical analogies we were already familiar with. AR will be best when we can interact using the 3d analogies of touch and gesture we are already familiar with. This is a major draw back of ARKit/Core; while it offers a glimpse into the AR world, we still interact using yesterday’s mobile computing gestures.
A great example of this kind of interaction comes from Leap Motion. Their device allows natural hand tracking, even when your hands are out of view. Furthermore, they’ve demonstrated some human-computer interface paradigms that make a lot of sense for AR.
There is a great opportunity here for companies to build cross-platform user interface frameworks (think Qt or Cocoa Touch for AR) that discover and evangelize the user interaction paradigms that will become standard in the future. Some user interfaces will be dependent on the specifics of the hardware platform and its sensing capabilities, but others will be universal across platforms. Finding these universal 3d interaction paradigms, and packaging them in a SDK ready for developers to utilize could be valuable. Using a Leap Motion tracker in VR would be a way to start building the AR interface systems of tomorrow, today.
Lastly, we need a form factor or device better than the “magic window” provided by our smartphones. AR hardware has long been envisioned as a wearable of some kind, usually depicted as glasses. We currently have 1st-generation wearable AR displays from Microsoft, DAQRI, and Meta. These are expensive, consumer-unfriendly, and in some cases require a tethered computer to run. For the most part, the current generation of AR wearables are geared for enterprise usage.
ARkit and ARCore are attempting to fill the role of a consumer AR wearable by unlocking a subset of AR use cases with today’s smartphone. This is a smart approach, as it allows developers previously experienced with 2D development (web and mobile) to get used to 3D development (Unity, Unreal), and allows devs to start thinking about what kind of applications make sense in this paradigm, and what kind of services are needed to enable those applications.
Magic Leap has raised over a billion USD and has long been rumored to be developing a more practical wearable that could be used by consumers.
The hardware has a very long way to go, and hardware is also notoriously expensive to develop, making this area a challenging target for startups.
Once the hardware, spatial services, AI/object recognizers, and natural user interfaces are combined into one device and SDK, the computing world as we know it will change forever. It’s at this point that everyone can start experimenting and building the new use cases and applications that will bring true value to users.
Based on the hyped and subsequently slow development of consumer VR, we should be careful to ensure that AR is not a “solution in search of a problem”. Rather than banking on the novelty of new hardware capabilities (“Wow, I can look around” or “Wow, that animal/person/object is so close to my face!”), we should as Stephen Covey advises “begin with the end in mind”. What is the value proposition that makes AR so useful? What kinds of applications or content make sense in this new computing paradigm? What things can we do only in AR, that just don’t make sense in any other format?
There will surely be new types of game experiences that are analogous to older game types; at Emergent VR we prototyped what a world-scale AR first person shooter would look like using HoloLens and ARKit:
But there will also be new game types we haven’t thought of yet. That’s one of the most fundamentally exciting opportunities of any new platform or computing paradigm: the ability to find the use cases that haven’t yet been discovered. Beyond entertainment and gaming, the utility of AR is likely to be it’s biggest contribution.
Having a platform that is always on and continuously aware of your spatial context and the objects in it, will enable powerful AI that understands you well enough to push the data and information you need in real-time. Advanced hardware displays will render this information seamlessly and naturally into the real world around you. Finally, you can interact with this information in an intuitive and natural way. This platform will be transformative for businesses and consumers alike, and offers huge opportunities for startups and later stage companies willing to take a risk developing the systems required to make this happen.