Behind the Scenes of AR | How does it work?

So if you haven’t been living under a rock in the past year, chances are you’ve played and experienced the phenomenon that is Pokemon Go. I know I have. As a game that primarily uses Augmented Reality, it proved just how engaging experiences with AR can be. But it wasn’t all smooth sailing. Anyone who has played the game will remember the moment you have to throw a ball at it to collect your favorite character, and the moments of frustration that follow when time and again you miss the mark, and your Pokemon.

Why does this happen?

Pokemon Go uses simple GPS-based tracking and your device’s compass/gyroscope to get orientation and position information (accurate to about 4–5 meters). This is why we cannot move closer to Pokemon (they seem to move along with us) and also why Pokemon always appear to float over the ground or on a horizontal surface we’re pointing at, almost defying gravity. This may be passable in Pokemon Go, but when it comes down to something like positioning furniture or getting a realistic idea of how something could look, it just doesn’t add up.

The good news is, these problems can be fixed — with something called SLAM.

Nope, I’m not talk about slam dunk. The SLAM I’m talking about is an abbreviation for something called ‘Simultaneous Localization and Mapping.’ SLAM solves the problem of tracking the ‘pose’ (position and rotation) of a device relative to any environment and simultaneously mapping this same environment. Starting with an assumed map of the environment, SLAM alternates between tracking and mapping and calculates the pose based on that. Once the pose is found, it corrects the map and then again estimates the pose with respect to the new map. This keeps happening on loop.

Historically, most of the research in SLAM comes from the Robotics research community. It stemmed from the need of robots like the mars rovers to navigate unknown environments, where remote control is not possible.

Robots use multiple sensors, along with RGB cameras to solve the SLAM problem. Robots typically have sensory odometry information available as well. ADAS cars, drones, and AUVs — all use some sort of SLAM for navigation.

A smartphone differs from robots that it usually has only a single RGB camera and an IMU (Inertial Measurement Unit). This boils the problem down to a specific domain — Monocular Visual SLAM. Here, the mapping part has to be done through the available RGB data. Tracking can be aided by IMU information.

What we are attempting at Whodat is to bring together some of the most cutting-edge research in visual SLAM to run real time (greater than 30 fps) on standard smartphones and tablets. We are providing robust and accurate tracking that is aided by fusing IMU information with visual SLAM. Our solution works cross-platform — across different OS (Android and iOS) and mobile devices of different hardware capabilities. The platform we’re building also accommodates additional sensors (like stereo cameras and depth sensors) that can be used to provide a better tracking and mapping output. If you’re a developer, we want to give you an easy and simple way to make your AR vision come alive.

So what’s your Pokemon GO?

Like what you read? Give Swati Banerjee a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.