What Makes Google’s Project Tango So Accurate?
Lenovo’s upcoming smartphone, the Phab 2 Pro has a lot of hype building behind it, and I think it’s fair to say most of that hype is being driven by its “Tango” enabled functionality.
The Phab 2 Pro is the first official smartphone to have Tango built into the hardware. In case you need a refresher, Tango is an Augmented Reality platform that incorporates depth sensing 3d cameras, motion tracking, and area learning to scan and memorize the layout of your surroundings and generate Augmented Reality images on top of that.
The beauty of the Tango system is its accuracy. Let’s be honest, when most consumers think of Augmented Reality, they picture Pokémon Go. The app did a great job introducing millions (As of August 1st, the download number was 75 million) to AR. But, as any Pokémon Go player would tell you, the character placement was pretty floaty. This is because the algorithms running the game were approximating depth without actually being able to see it, leading to non-existent perches for Pokémon to stand on, as well as Pokémon that seemed to move or float around an environment as the user moved their camera.
Project Tango is bettering this system by using a depth-sensing camera (something no phone currently out on the market has) and motion tracking to accurately assess depth and “learn” the layout of the user’s environment. Once the system has “learned” the layout of your environment, it can appropriately react to the placement of an object or character within that environment.
So, to break that down, it all starts here:
First, the system utilizes a 3D camera, which casts out an infrared dot pattern (like the one seen above) to illuminate the contours of your environment. This is known as a point cloud. As these dots of light get further away from their original source (the phone), they become larger. The size of all the dots are measured by an algorithm and the varying sizes of the dots indicates their relative distance from the user, which is then interpreted as a depth measurement. This measurement allows Tango to understand all of the 3D geometry that exists in your space (in the image above, it sees the chest, the items on the chest, and the back wall all as different pieces of 3D geometry). This raw data is combined with information from the more standard 16 MP camera, which brings the actual images of the real world into your phone. With this point cloud Tango can now see and understand the world, but it still needs to be able to move through it.
As you can see above, Tango is capable of capturing point cloud information while simultaneously moving through the world. This is accomplished with a motion tracking camera on the front of the device. At its core, motion tracking is a relatively simple concept. You start in one spot, which the system recognizes as your starting location, then — as you move around — your Tango-enabled hardware utilizes accelerometers and gyroscopes to calculate how far you’ve traveled from that location, and in what direction. When combined with the information gathered from the 3D-depth sensing camera, your system ends up with a clear image of the world around it.
So, in effect, 3D depth sensing and motion tracking make up the basics needed for an Augmented Reality experience, but there are still some problems with these using these techniques alone.
One of the biggest problems is that over time (often meaning in just minutes) digital objects in a real space have a tendency to float away from their starting locations. This is the result of very small, motion-tracking based errors that, individually, are harmless, but over time accumulate to cause an issue known as drift. You can see this drift happening with the chair to the left. It starts in the correct location, but over time shifts to a wholly incorrect one. In order to combat this, Tango has employed a technique known as Area Learning.
Area learning, while certainly hard to pull off, is relatively simple to understand. Essentially, it is Tango’s way of memorizing a space, and utilizing that memory to keep virtual objects grounded in the correct locations.
For a deeper look at area learning, check out the image below, which was taken from Project Tango’s conference at Google I/O 2016:
On the left side of the image is an audience, stage, podium, and stage background. Notice the yellow dots. These dots represent the landmarks (and potential landmarks) that Project Tango is using to memorize the space. In the right image you can see all these points represented in a grid pattern. The points that have a line extending to them from the Tango device (represented by the white rectangle just right of center) represent dedicated landmarks, or points that project Tango has determined to be the most useful in memorizing the environment it is in.
Area learning takes these points and dedicates them to memory. To demonstrate this, the Tango speaker at I/O 2016 first showed the podium covered in location markers. He then cupped the camera with his hand so that the podium itself could no longer be seen. However, even though the cameras were totally blind, the image of the podium and the backdrop were still visible in marker form.
The ability to learn/memorize the location of these markets allows Tango to maintain awareness of its surroundings even when the user turns away from them or leaves the general area. It also provides real-world reference points that can be used to lock virtual objects in place. So, if the speaker presenting on stage were to place the virtual chair from earlier next to the podium, the location of the chair would be directly connected to the location markers of the real world podium, thus rooting the virtual chair in real world geometry and preventing it from drifting.
Area learning also affects applications where the virtual assets are moving. For instance, imagine a game such as Legacy Games’ Crayola Worlds, where you’re being chased by zombies. Without area learning to keep the zombies on track, drift would eventually impact the direction of their running animations, making it look like they’re running into walls rather than running towards the player. But with area learning in place, the zombies will always have reference points that help them react appropriately to the geometry around them and also keep them focused on you, the player.
These three techniques — 3D depth sensing, motion tracking, and area learning — combine to make Google’s Project Tango a tour de force of Augmented Reality tech. Tango’s use of real world geometry as the guiding infrastructure for placement of virtual assets makes for a far more realistic feeling AR experience than anything currently out on the consumer market today (note that the Microsoft Hololens, which is not yet out for commercial sale, uses very similar systems to similar effect as the Project Tango).
Using Tango, you can set a virtual object on a real table and it actually feels like it is sitting on top of that table.
Or use the location markers to accurately measure distances for carpentry work or sizing up furniture.
Or you could even use it to help navigate indoor spaces that have been learned and memorized by Google’s servers.
There are a ton of possibilities for this new technology. Education, training, carpentry, online shopping (where you can actually see what the furniture looks like in your home), and good old fashioned entertainment. By finding a way to ground virtual creations in the real world, Project Tango and other tools like it are rapidly changing the way we interact with our devices and our world.
We’re looking forward to seeing what else it can do when Lenovo releases it’s Phab 2 Pro, the first Tango-enabled smartphone in fall of this year.
**Written by Nathan Hoffmeier