My Ideal AR System

The AR space, while still nascent, got some big boosts recently. Apple and Google released ARKit and ARCore to power AR experiences on their phones. Snapchat and Facebook released their AR tools to the public. Magic Leap announced that its hardware would be available this year, and Intel revealed Vaunt, a smart glasses product that looks like a regular pair of glasses. And of course there are products like the Meta 2 and Hololens that are here now and which should be more powerful and capable by their next iteration.

These products all have to solve some very hard problems — getting displays that are small and comfortable to wear, accurately tracking 3D positions of headsets and phones, rendering realistic images on top of the real world, inventing new methods for interaction and much more. It will be many years before we have good and affordable solutions to all of the challenges AR faces.

Once AR technology matures into something small and comfortable to wear, it will have a huge impact on our lives. It could be even bigger than the introduction of the smart phone. But, seeing how we get from Snapchat and Pokémon to where we might be in a few decades can be a little hard to imagine since a lot of the technology hasn’t reached maturity yet. So, I took some time to try to put together a coherent picture of what AR might look like in our day-to-day lives. Here’s how I think such a system could work:


First, AR requires it’s own operating system tailored around AR. iOS and Android are tailored around the capabilities, use cases and interface affordances of mobile devices, and AR has unique needs too. AR isn’t just an app on a phone, it’s a new method of computing that combines aspects of wearables, ambient computing, immersive computing and digital assistants.

The OS would of course handle running applications and providing basic services to applications such as the 3D phone pose tracking, eye tracking, hand tracking, gesture recognition, face and image recognition, and way-finding. It would also provide a system for providing notifications to the user in an un-obtrusive and customizable way. Eye tracking and hand tracking are not required for AR, but they would make it possible to navigate more complex UI’s and turn it from a passive display into something you can do real computing on.

The job of an AR device is to provide information and experiences instantly and help people be more effective and more productive. There will be a balance; it must provide just the right amount of information at the right time and not be too overwhelming or distracting.

AR applications should be much different than native mobile applications and more like using Google Maps. Most of the time people would not be using native applications at all. Getting information should be as frictionless as possible and should not require installing applications for every business you want to interact with or task you want to accomplish. Instead, the AR OS should be spatial and contextually aware. I picture four main ways to find ‘AR content’: through image recognition, geo-location, search or suggested by a digital assistant.

For instance, if you want to get more information about an object or a place of business, you would be able to simply look at it (using eye tracking) and make a clicking gesture with your hand (like on the Hololens). Imagine seeing a piece of furniture you like at someone’s house — you could pull up prices and purchase it without ever setting foot in a store.

AR content should be frictionless. Similar to how any business can have a presence on Google maps anyone can publish their own website, I would like a future where anyone can publish AR content for people to find via search or proximity. All content has a location in space, so information is contextual by default and getting walking or driving directions is a core part of the operating system.

Now I’m sure you’re thinking that a system where anyone can place digital objects will quickly become filled with advertisements and spam. So this content should only displayed as the result of a user action or suggested to you by your AI assistant, based on your own preferences and behavior. Algorithms could help surface popular or informative landmarks, but again only with your permission, and they could be hidden or turned off easily. A moderation system where people can report inappropriate content or spam would also be necessary. In the end, it’s the same sorts of problems social networks are already learning to deal with.

AR content could be 3D objects or applications authored with something akin to HTML and JavaScript. Meaning it’s a safe language for developers to define spatially located objects, text, buttons and interfaces You can interact with them and do all the things you can do with a website — place orders for products, get information and phone numbers, read reviews etc. Native applications would still be possible where you need an app that does heavy processing or needs to take over the full experience, but I imagine that these would be the exception rather than the norm.

Spatial localization would have to be orders of magnitude better than what we have now for GPS. The AR of the future needs millimeter level accuracy and should extend inside of buildings. Business would be able to create waypoints and experiences inside their shops and offices, and be able to tell you where in the store to find a product and exactly where on the shelf it sits. Figuring out the correct terminal at the train station will be easy.

Voice would be another way to interact with your AR device. So you could search for local restaurants or set reminders at any time, and any location without needing to take your phone out of your pocket. With accurate eye tracking, typing with your gaze may also be possible for situations where voice isn’t an option. Bone conductive speakers built into the frame of the glasses could carry sound to your eardrums without needing to wear headphones or ear buds.

Of course if you have voice built into your AR device which also does image recognition and location tracking, you can much more easily make complex requests like “Remind me to ask my wife how her appointment went when I see her”, or “Notify me when a nearby Krispy Kreme has fresh donuts”. We can do these interactions now with assistants, but the AR device would always be on, and would be aware of what you’re seeing and looking at too.

AR would be a great way to interact with internet of things devices. Turning off lights by gesturing at them or browsing the energy consumption of your hot water heater by looking at it, would be incredibly cool. Of course in this fantasy world we are imagining, IoT manufacturers have all agreed on common standards for querying and interacting with all devices.

Notifications of text messages and emails could pop into your vision and be dismissed or inspected using eye or head gestures. People are already finding utility in smart watches for doing this exact thing. Freeing your hands will be safer, and enable people to use technology in more scenarios.

It also goes without saying that people are addicted to social media and our ‘digital selves’ are becoming an important part of our identities. Being able to request access to someone’s public social profile or LinkedIn page via facial recognition could be a really powerful way to network with other people that we meet.

AR gaming would also be amazing. For these types of experiences, the application would take over your device, much like a desktop games can run in full-screen mode. Imagine going to your local park and playing an MMO like Final Fantasy Online with your friends, where you all could take part in the same experience and see the same thing. Support for synchronizing multiple players in the game world, and joining games with nearby friends should be provided at the OS level as well.

Businesses will be able to use AR to improve their CRM systems. For example, a front desk person at a hotel could pull up your information through facial recognition and get you checked in to your room quickly. They could also see that the last time you stayed you requested extra pillows and suggest having some delivered to your room. This information is delivered to their vision quietly during the course of your interaction and it helps them deliver better service, without having to break the social interaction to look at a computer screen.

Privacy would of course need to be a strong part of the operating system. Since AR requires a myriad of sensors, it also provides a stronger means for people to spy on you than any system in the past. Personal data should be processed and kept on the device wherever possible, with strong safeguards for your data in the cloud. Your device should include an AI assistant that works to find you the most relevant and personalized content for you, not just the most relevant ads or products to sell you.

Obviously all of this is going to take some pretty huge cloud services to power it. Someone like Google, who already has a very strong offering in the maps space could do it, but tying all this together into an easy to use product requires strong end-to-end hardware and software chops, like Apple. Both companies are clearly working towards developing AR technology, so time will tell who comes to dominate in this space.

Hopefully this helps to understand why people are hyped about the potential of AR. If you’d like even more examples of use cases for AR, check out this great article by Sarah Downey: https://uploadvr.com/augmented-reality-use-cases-list