4 Solutions for Implementing 6DoF Tracking on Lightweight AR Glasses

Rokid
5 min readMay 3, 2019

--

The very best augmented reality experiences create an interactive experience that mixes the real-life environment with virtual contents. The exponential growth of this technology means that the quality of AR experiences continues to evolve and improve. Early stage AR applications used markers to track the 6 DoF pose (3DoF rotation and 3DoF position) of the camera to allow the overlay (of virtual content on the real-world scene. Then Simultaneous Localization and Mapping (SLAM) enabled position tracking without pre-trained markers. SLAM brought about the ability to mix virtual objects with all aspects of the AR world to produce even more immersive and realistic experience on virtual content. Together, these two solutions continue to shape experiences in the augmented reality space.

Right now, many top-tier AR devices use 6DoF tracking to deliver the most accurate and reliable tracking experience on any scene. To do this well requires a high-performance tracking system driven by multiple fisheye cameras, TOF camera and other sensors. Unfortunately, all of these together make devices in question heavy and expensive. Even the most feature-rich devices with exceptional business applications simply will not be accepted in the marketplace if they’re uncomfortable and cost too much.

And so to succeed in the wearable AR device space, products must combine comfort, affordability and technology in the most innovative ways possible. Rokid Vision is on the forefront with lightweight AR glasses that embrace 6DoF tracking in a way that requires limited sensors and reduced computation resources.

Let’s compare the following four popular 6DoF tracking solutions based on cost and performance:

Single Video Camera

Single Video Camera

● Lowest cost of hardware

● Easy to incorporate into many designs

● Requires more specialized optimization and customization

● Works well for the motionless scene

Videos cameras are common components for AR glasses, used to take high-resolution photos and videos. Keeping a single camera on the glasses is relatively simple both in terms of appearance and usability. However, videos cameras usually operate at low frequency (<60fps, mostly <30fps) and cannot capture high-quality images in motion. The “jello effect” and other distortions make position tracking fail. IMU sensors can improve picture quality results but simply don’t measure up to other solutions.

Monocular Fisheye

Monocular Fisheye

● Sensor reserved for 6DoF tracking.

● Continuously updating 6DoF tracking thanks to high frequency

● Some scale drifting

Some AR glass products use a single fisheye reserved for 6DOF tracking. This solution requires some additional power consumption but often provides a better 6DOF tracking result for the device. Thanks to the high camera frame updating rate (>90fps) and global shutter feature, the SLAM keeps tracking position in the motion scenes. The single fisheye camera should be placed in front of the glasses — a design requirement that imposes some design restrictions but not enough to stray too far from the look of typical sunglasses. Where the fisheye fails is at measuring the scale of an environment. The distance measured in the SLAM map will drift and cause the virtual objects to move unexpectedly in the scene. There is still potential for good user experience if developers design the VR-like applications not intended to have the virtual content and the real world tightly coupled.

Stereo Fisheye Camera

Stereo Fisheye Camera

● High power cost

● Good accuracy on tracking and scale measuring

● Challenge for industrial design

The stereo fisheye-based 6DoF tracking system has been proven to be a marketable solution by different AR/VR headset makers. Qualcomm has already demonstrated high-quality 6DoF position tracking on their VR headset using stereo fisheye vision. The increased power costs of additional cameras prove a worthwhile trade-off when it provides immediate map initialization, robust tracking and accurate measuring on the environment. Compared to the monocular solution, Stereo Fisheye can extend the scene and track much quicker. Even though the sensor number is doubled, the computation complexity is not much higher than a monocular vision system. High-quality optimization and customization work on a stereo system can make performance close to a heavier SLAM system.

SLAM Running on Edge

Intel T265

● Highest cost

● High-quality 6DoF tracking

● Stable performance for different platforms

Putting the computation on Edge is getting trendy in AI related devices. Hololens, as a state-of-art work for AR HMD, has already run its SLAM function on Edge hardware to make the CPU and the OS to work more efficient on user applications. For lightweight AR glasses, running the SLAM on Edge is not just a way to reduce the computation load but also to make it possible for the AR glasses to be compatible on different kinds of host platforms. The strongest argument for this solution that it equalizes performance on any host platform without the need to custom-optimize algorithms. However, it is not easy for lightweight AR glasses to run 6DoF on edge. The chips on these types of AR glasses are usually designed only for driving the display and transmit sensor data without resources left for other computation work. One answer to this issue is to integrate a mature 6DoF tracking module like the Intel T265 to the board of the AR glasses.

There is no conclusion yet on which solution is “perfect” for lightweight glasses. Designers will need to continue to define the features and intended usage of their products to make the best hardware and software choices. But there is no doubt that AR glasses with 6DoF tracking function will continue to be most attractive to consumers and remain the most competitive in the future AR market. Wonder which solution is implemented on Rokid Vision? Follow our journey for the latest updates, stay tuned at https://glass.rokid.com/project

Author: Zhiyu Huo

The author: Zhiyu Huo, Research Scientist at Rokid (https://glass.rokid.com), PhD in Electrical and Computer Engineering from University of Missouri, working on computer vision algorithm for AR.

--

--