Augmented Reality Robotics

9 min readJun 19, 2018

[Note: This is Part 5 in a five-part series on TickTock’s robots: P1, P2, P3, P4, P5]

Mars rovers and your smartphone have the same problem: figuring out where they are without GPS.

Panorama from the Spirit rover at Bonneville Crater Panorama. Image Credit: NASA/Jet Propulsion Laboratory/Cornell

Whether you’re inside a mall, a deep parking garage, or an urban canyon, your cellphone’s GPS isn’t much use when it can’t see the sky. The same is true for robots on other planets where techniques such as Visual Odometry are used to track movement over time.

Images from Spirit’s camera with tracked features superimposed. Image Credit: “Two Years of Visual Odometry on the Mars Exploration Rovers” by Mark Maimone, Yang Cheng, and Larry Matthies of JPL

Visual Odometry has been around for decades but is really taking off with mobile augmented reality. Take a look at all the references from Larry Matthies on using Visual Inertial Odometry on the Mars Exploration Rovers. Well, it should come as no surprise that his lab at NASA JPL was an early participant in Google’s Project Tango effort to give smartphones that same capability. Steve Goldberg, also with JPL and now full time at Google, is one of the few people who can claim to have optimized the VO pipeline for a rover on Mars, and the one powering AR on your your smartphone!

The higher level technology that JPL’s rovers and Tango (now known as ARCore) use is called SLAM, or Simultaneous Localization and Mapping. More specifically, they use VSLAM and VIO, where cameras and motion sensors come together to create Visual Inertial Odometry. Just like your eyes and ears work together, robots and augmented reality devices uses cameras and motion sensors.

A poorly behaved virtual cat

While many robots can accomplish a lot by simply knowing there they are, more complex systems such as self-driving cars require a much more complex understanding of the world. The same is true for augmented reality if we’re going to get experiences beyond simply placing virtual objects on the ground.

In fact, it was a misbehaved virtual cat that really struck me as to how similar the future of AR and robotics are — and it goes way beyond mapping.

I walked my virtual cat over to a glass door and it went right through it.

Tango’s “AR Pets Experiment” app with a virtual cat going right through a glass window pane in a door. The infrared time-of-flight depth sensor in the Tango phone simply doesn’t detect the glass and there’s no AI to assume it should be there in a door frame.

Someone else walked into the scene and stepped right on my virtual pet.

Another demo with Tango’s “AR Pets Experiment” shows a cute cat in the middle of its idle animation when another person walks into the frame. There’s no realtime people tracker to recognize the incoming person, estimate their future path, and get out of the way to avoid ruining the AR illusion.

Try asking a virtual pet to “go lay down”, and watch it do nothing. These virtual characters that are supposed to be the future of our digital lives still have zero understanding of the world. They rely on imperfect mobile sensors that struggle with “optically uncooperative surfaces”. AR devices can’t yet recognize, track, and predict the movement of people or animals.

The illusion of reality can being ruined 100’s of different ways and the same problems affect robots too. I want a fast-moving home robot that doesn’t crash through my glass door. I’d like to walk into a room and not trip over a robot that failed to get out of the way. I want to use all the verbal queues I do with a real dog (e.g. “sit”, “stay”, “heel”, “lay down”, “come here”) to command my robot on where to be.

This requires a level of spatial reasoning AI that doesn’t exist today. It was this insight that ultimately led to forming TickTock, where we wanted to give virtual and physical systems a better understanding of their world.

It still starts with position

Don Dodge lists dozens of companies tackling this space back in 2013. Fast forward five years, and we’re not seeing those beacon or WiFi-only solutions taking off. The big winners are going to be those embracing ARKit and ARCore, which make heavy use of cameras. Those two platforms alone are expected to power more than a billion devices in 2018!

As the “AR Cloud” grows to include more maps and semantics about the world, you can expect to be provided product-level navigation with centimeter level accuracy — that’s right, your phone will finally take you exactly to that screw you need in the hardware store.

Yet were’ still using legacy map formats for robots.

Take the example of Schnucks grocery store in St. Louis. There’s a robot running around to collect inventory, and another platform used for augmented reality shopping experiences. Each of these is using a different mapping format for the same location, leading to incompatibility between systems, and redundant maintenance to keep the maps current.

A new batch of startups are getting into the AR mapping space, with 8th Wall, Escher (now mapping the world for Niantic), 6D.ai, and Fantasmo all offering location services and cloud APIs to help users understand where they are. Google is in the unique position of offering both LiDAR mapping, as used by Cartographer, and with VSLAM, underlying Project Tango / AR Core / VPS.

TechCrunch video covering the announcement of Google’s Visual Positioning service at I/O 2017.

Cartographer was open sourced in 2016 and quickly become the prominent 2D and 3D mapping solution for robots from Marble delivering packages on sidewalks, to Lyft’s Level 5 lab automating self-driving cars. Unlike LiDAR based SLAM, Google’s VPS and Cloud Anchors use simple 2D cameras, which means that SLAM works right on your mobile phone without additional hardware. Despite both being from Google, Cartographer and VPS are currently incompatible, and that’s the problem — TickTock needed a robot navigation system that also worked with a smartphone.

TickTock bet on Android-based VSLAM

We set out to build a custom robotic stack that would run on Android, using the same Project Tango / AR Core / VPS maps that a user would. This shared frame of reference is critical to the user experience where robots and phones have the same understanding of their position in space.

The robot’s point of view on the left shows how it locates itself in the world by recognizing unique visual features on walls and objects. The user’s handheld phone does the same from the opposite direction, allowing the user to give precise location commands to the robot in augmented reality.

We started TickTock in early 2017, just as the Project Tango effort was being re-branded as AR Core, and VPS was still in a beta state. Despite being in flux, it was the ideal path forward and we became an Android OEM so we could build robots using the same components that a smartphone maker would.

This early demo shows how AR can be used to see points of interest in the world, the position of the robot, and the path it plans to travel.

An early demo of the TickTock software running on a small mobile robot at home with kids commanding it. The smartphone is showing what the robot knows about the world through an AR companion app.

AR benefits from decades of prior robotics work

In total there were five robotic labs that were part of the Tango program. Just look at the all of the Tango references here on the University of Minnesota's MARS lab page, nestled among work on drones and Mars rovers. ETH Zurich’s Visual Computing lab, has likewise made huge advances in visual mapping and it’s no surprise that Marc Pollefeys is now the Director of Science on Microsoft’s HoloLens. Andrew Davidson’s team from Imperial College London, also advised with Tango early on and his background makes overlapping mentions of all these technologies coming together.

Johnny Lee (right) leading a brainstorming session in the early days of Tango in the Motorola ATAP office in 2013. Visiting researchers included Larry Mathies from JPL, Stergios Roumeliotis of UMN, and Marc Pollefeys from Imperial College London.

Augmented reality, robotics startups, and self-driving cars continue to have significant overlap. Gabe Sibley’s labs from both George Washington University and the University of Colorado were big on AR and working with Tango in 2013. One of Gabe’s students, Nima Keivan, went on to found Canvas Technology, one of the few robotics companies not using ROS or LiDAR based mapping.

Gabe Sibley and Nima Keivan, from GWU’s Autonomous Robotics & Perception Lab, visited the Project Tango team in early 2013 to hash out ideas on visual SLAM.

Gabe later started Zippy, that is now part of Cruise Automation and giving spatial understanding to GM’s autonomous vehicles. In related news, Zippy was co-founded with Chris Broaddus and Alex Flint, who were both from Flyby Media, a pioneering augmented reality company that pre-dates Tango. The Flyby team helped Google asses VSLAM solutions in 2013, before being acquired by Apple, where they went on to build ARKit for iOS.

OK, I get it. Robotics has greatly helped AR. What has AR done for robotics?

AR starts to give back

It’s hard to find better evidence of AR helping robotics than the Movidius Myriad chip. We used the v1 Myriad on the original Project Tango 5" phone prototype, and collaborated on new ideas that made their way into the Myriad Vision Processing Unit (VPU). These chips now power DJI’s amazing drones that hold their position steady and avoid hitting trees. Google and Movidius (now part of Intel), remain good buddies and recently launched a new DIY, er…AIY, kit that makes it even easier to tap into this technology.

Google’s “Vision Bonnet” circuit board is part of the AIY project, and features a Movidius Vision Processing Unit (VPU).

Microsoft was an even earlier leader here in applying AR tech to robotics. The breakthrough Kinect sensor for the X-Box 360 was announced in 2009, and quickly hacked by robotics labs everywhere to provide low cost depth sensing. It was used to make 2D and 3D maps, track people (as it was intended for), and aid object recognition on the Turtlebot and many other robots.

The Microsoft Kinect 3D depth sensor was popular with the Turtlebot’s 1st and 2nd generation, as shown here in a breakout diagram. It typically plugged into a Linux netbook for processing (not shown).

Augmenting our living rooms with 3D sensors that tracked our movement didn’t take off and Microsoft cancelled the Kinect, only to bring it back specifically for AI. I’m sure roboticists would also love to get their hands on the new HPU (Holographic Processing Unit) that’s going to allow HoloLens to map rooms while recognizing people and things — all the stuff of dreams for robots.

Startups are blending the two worlds

Vitaliy Goncharuk has also been driving this transition from AR to robotics with Augmented Pixels. Notice their press coverage in 2014, that solely focused on AR experiences for shopping and real estate. By 2017, they announce a partnership with LG, for a module that provides SLAM for both robots and wearable AR/VR glasses. Same tech, two entirely different platforms.

LEG’s compact 3D camera module done in partnership with Augmented Pixels.

We’re also only months away from the Misty 1 release, which was announced at CES earlier this year and has an Occipital Structure Core sensor in its head. It’s no surprise that Occipital’s $13M Series B raise in 2015 included support from Grishin Robotics, a VC firm focused on robotics and IoT. Their Core sensor is now being pitched as equally valuable for robotics as it is for AR/VR.

The head of a Misty prototype shows the Occipital Structure Core sensor at the top.

Another AR-turned-robotics startup, Paracosm, also hails from a background in robotics, and supported the Tango effort before their acquisition by Occipital. Most recent is Fantasmo, who just came out of stealth with their decentralized mapping platform, and highlights both robotics and AR as use cases in their launch announcement.

I suspect that in just a couple of years we’ll start seeing high school students making autonomous robots with nothing more than an iPad and some wheels and motors. It’s only going to require a few hours of programming in Unity on top of ARCore or ARKit, and we’ll see both virtual and physical characters running around together.

Bring on the platform wars

“Alexa, turn on the hallway lights”, and “TickTock, go to the hallway”, are commands that will require our voice enabled gadgets and our mobile robots to understand where the hallway is. Maybe we need robots to map our homes and understand where things are instead of managing 2D lists of the IoT gadgets that make up our ever-growing “home graph”.

A 3D map of my home made with the “Constructor Developer Tool” for Tango. Robots will capture similar data and process it to recognize objects and their locations within the home. Standards for sharing that kind of data do not exist yet but will be core to all IoT platforms in the future.

I’m sure Amazon, Apple, Facebook, and Microsoft will also get into the cloud mapping game and there’s definitely going to be a battle to own this data. It would be terrific for open standards to emerge so my Roomba can talk to my Kuri and Misty robots, perhaps while interacting with augmented reality play experiences on both Android and iOS.

Despite decades of the tech overlapping, the product experiences coming to robotics and augmented reality are all brand new. Keep an eye out for more on what the “AR Cloud” enables, because it won’t be used just for silly games — robots will need augmented reality and AR needs robotics!

[This is Part 5 in a five-part series on TickTock’s AR-powered robots. Be sure to check out Amazon’s Echo Show on Wheels, Consumer Robot Concepts, Low Cost Mobile Robots Using Android, Robots for Retailers, Augmented Reality and Robotics Overlap]