Understanding Reality: Part 1

7 min readOct 31, 2019

Setting the stage for Augmented Reality

This is part 1 of a 2 part series examining how Augmented Reality recognizes and interprets the world around you. You can find part 2 here

Augmented Reality, or AR, has a lot of hype in the tech industry today. With the ability to add digital content into the real world, it is no wonder that AR is quickly finding new and interesting markets to enter. On-the-job training, marketing and entertainment are just a few major industries being disrupted. With this disruption comes a litany of ways to add content into the real world in order to add value, but what do we place? Where does it go? Where can it go?

I like to think of an AR application as a performance. For a good performance, you need a stage, actors and a compelling story. Each AR application created needs to understand what stage and actors it can work with. Only then can it tell a great story. This is where the value is created but that value doesn’t need to be an AR application if the stage and actors aren’t important.

So, how does AR actually interpret your world? When the stage and actors keep changing, how do we recognize what matters enough to put on a good performance? Let’s take a high level look at some of the key areas of interest for an AR application designer.

Visual Data

Likely the first thing you think about when examining how AR understands the world is its visual data processing. There is a lot going on in your world and the camera can interpret a great deal of it. It understands the environment you are in, can track objects located within it and may even integrate content to make it feel more like you share your world with the digital one. So, where do we start?

Setting the Stage

When an AR application is started, it needs to understand where it is: The Stage. Have we seen this setting before? If not, where are the floors, walls, ceiling? Are there any? Are there other boundaries such as furniture or trees in the way.? The user’s environment can have a big effect on how they can consume content, and needs to be understood at least minimally in order to create a good experience. Let’s take a look at what current camera and AR technology can identify today, and when we might care.

Surfaces

The most commonly used mapping approach is detecting horizontal surfaces. In the simplest form, this is used to set a basic area on which content is placed. However, tracking more than one surface can create a more immersive experience. For example, an object can fall from one surface and be “stopped” when it collides with another one.

When you add in Vertical surfaces, the application can create spatial boundaries for your environment. When used correctly, augmentations can begin to interact with your world in the same way that its physical equivalent would. For example, we can have a painting on the wall and a ball bouncing off the furniture.

Spatial Mapping

Thanks to some more advanced technology such as Hololens’ and Magic Leap, there are tools built to effectively manage all surface detections and use them to build a “spatial map”. These are meshes spread across your world in order to understand the surfaces within them, as well as how they are related to one another. It creates a more cohesive feel to the “stage” your content lives in and can even be saved for later so that an experience in this location can be loaded (see below).

Recognized Spaces

Room sync recognition in the Fragments app on Hololens

Sometimes an AR application is built to add value in a specific location. This is common in scenarios such as a factory floor, or guiding a user through a building. When this is the case, it is possible to have an initial user create a spatial map and save it for a future date. This allows an application to leverage a static stage and can therefore create a more repeatable experience for all of its users.

Finding the (Physical) Actors That Matter

The second major category of environmental aspects that AR can understand is trackable content, or to return to my previous metaphor, the (physical) “actors” in your performance. These are images, models, faces or anything else the application can recognize and track. By recognizing trackables, an application can position content in relation to parts of the real world as opposed to individual surfaces. This allows for content that moves with objects in the real world and can add context to something more specific than a surface.

Images

The most common form of trackable anchors today are images. When treated as a trackable target, Images rely on uniqueness and detail to be recognizable by software such as Vuforia. This recognition allows a camera to track where something is as long as the image is in view, resulting in content that is created in relation to a real world image. When leveraged correctly, this can allow an application to track a movable object and add content in relation to it, no matter where it is in the current environment. For example, a product could be displayed on a flyer that a potential customer could scan to envision the product in their work space.

3D Model Recognition

Vuforia model target in action

Another form of trackable anchor that is becoming more popular at the enterprise level is a model target. This allows an AR application to recognize a 3D models of an object and use this as the base for a trackable anchor. It leverages the same benefits of an image target, but allows the application to identify the object itself rather than an image on top of it. The benefit here is that the object can be recognized from more angles more effectively. This allows for a 3D digital twin to be seamlessly overlaid on the object for use cases such as walking a technician through a repair sequence.

Faces

If you have ever used Snapchat, you likely know about face tracking. This technology allows a mesh to be drawn over a users face in order to recognize their facial expressions and track where their face actually is. This mesh can then be used to add digital content such as bunny ears or sunglasses. While this is primarily used for entertainment purposes at the moment, there have been some use cases emerging leveraging this functionality for more practical ideas such as helping users with prosopagnosia (face blindness). A great tool out there today to make these quickly is Facebook’s SparkAR

Object Recognition

With computer vision growing rapidly, another potential trackable feature could be generic object recognition. While this is not a feature I have seen leveraged much in the AR industry as of yet, the potential for machine learning to help an application identify aspects of the environment using computer vision exists. I imagine this will likely become more common as both AR and computer vision become more and more accessible technologies. This could help users do things like recognize and avoid obstacles, find something within an environment and much more.

So, how do I decide what matters?

With so many aspects of reality that a camera can track, what really matters? It is rare that you will have the time or resources to implement all of them. Each application has different needs and each team has different resources. It is important to narrow down your requirements to what is most important to the application’s use case. Chances are you will have at least a few visually tracked components. Try to figure out the top 2–3 most important aspects of the users “reality”. Start there and test it out. In the end, it always takes iterations to get the right answer, but keeping in mind the possibilities for understanding reality can help you get there quicker.

I can see clearly now, but what if there is more to the picture?

Sometimes there is more going on than a camera can actually detect. The unseen data that flows through your reality can be just as useful when understanding what content to place and where to put it. In Part 2, I dive into other aspects of reality that can be understood, and why the IoT pairs so well with AR.