How SLAM technology is changing Augmented Reality

Eric Kollegger
4 min readJan 16, 2018

--

Augmented Reality is one of the hottest fields in technology right now, despite it’s extremely limited wider adoption rate. This is largely due to it’s classic implementation through marker tracking and the high barrier to entry that comes with it.

Image Recognition and Marker Tracking

Aside from 3D models and animation, the fulcrum of augmented reality is built upon two closely related technologies; image recognition and something called marker tracking. When paired together, they give a camera the ability to recognize the data of an image, trigger an associated experience off of it and track it’s position relative to the camera eye in three dimensional space. By uploading an image to a processing server, a developer can associate one or multiple images to a single augmented reality experience. This then allows the camera on your device to understand and track the overlaid digital content.

There are two core concepts to understand how this works. The first is Marker Detection, which is the act of recognizing an image (often referred to as a marker) through the camera lens and make a connection to its counterpart on the server to trigger an experience. The second is Marker Tracking, or the ability to maintain real-time orientation of a physical object or marker and continuously update the digital content to mimic it.

An example of a marker-based AR experience I built while working at Blippar

The quality of detection and tracking is determined by a number of parameters, primarily the complexity of contrast points and lack of repeating patterns a marker has. I could literally write an entire other blog post on the various intricacies of what makes for a great marker versus a terrible one. While this still provides a powerful experience for users, it comes with significant limitations that have hamstrung the industry since its inception. From a user experience perspective, not only do I have to download a specific app, but also have the physical object on hand to experience it. With SLAM, users only need their phone and the environment around them to gain access to content.

Simultaneous Localization And Mapping (SLAM)

Enter the sexily-abbreviated technology known as SLAM. This has only recently been a viable option as more devices start to adopt a secondary depth camera (at the writing of this, Google and Apple have found ways around the need for a dedicated secondary camera) that is required to make use of it.

The basic understanding of this technology is that it can actively recognize walls, floors and other physical barriers in space. Currently most apps leveraging SLAM only use floor recognition and position tracking to place AR objects on surfaces around the user. A select few platforms are capable of processing additional spacial information (walls, ceilings, furniture, etc.) to have a deeper understanding of the environment surrounding them. The two major players at the moment that heavily utilize these features are Apple’s ARKit and Googles ARCore SDK’s.

Dense point cloud reconstruction: Pointillism for the 3D world

This offers a level of variety and flexibility to how developers go about creating AR experiences that was previously impossible. SLAM actively makes hundreds of rough estimations per second of where it believes surfaces exist and anchors those micro-locations with points or vertices. By building a (say it with me now) dense point cloud reconstruction, a device camera can not only recognize physical space it sees but remember the relative positioning of objects when it turns away from them. Here’s a quick demo of this technology processing information in real time:

The advantage tracking-based experiences hold is a big one; context. When a user picks up an object, the digital content that activates feels like it’s part of the object. If I trigger an AR experince off Honey Nut Cheerios and the cartoon mascot flies off the package, that makes sense in it’s execution. This behavior gives apps a sense of being a ‘browser’ of the physical world, to a point. It’s a much harder to accomplish this with SLAM, where the computer can better understand the form of the world around, but cant provide substantive feedback based on what it’s seeing.

With SLAM a user can conceivably navigate their favorite cartoon plumber across their office collecting coins and avoiding carnivorous plants. Redecorate a living room with furniture and posters that aren’t actually there, or drop a series of arrows on the street directing you to the nearest coffee shop using GPS. When these two divergent methods of interaction begin to seamlessly merge is where I believe we will see the most powerful and widely adopted implementation of augmented reality. The promise of this merged technology are endless.

There are wider ranging opportunities for SLAM technology beyond augmented reality as well. As a whole, SLAM makes it so computers can have an eye in a more literal sense; contextually understanding what’s around them through visual input. This understanding is already being expanded into other fields like robotics, self-driving cars, machine learning and artificial intelligence.

--

--