Why we make our robots shop for groceries

How challenge tasks drive the development of our mobile manipulation systems

Published in

Toyota Research Institute

8 min readOct 26, 2022

By The TRI Mobile Manipulation Team, including James Borders, Richard Cheng, Dan Helmick, Lukas Kaul, Dan Kruse, John Leichty, Carolyn Matl, Chavdar Papazov, Mark Tjersland

Toyota Research Institute Robot — The latest iteration of TRI’s fully custom, dual-arm mobile manipulation robot

At TRI, we are developing robotic capabilities with the goal of improving the quality of everyday life for all. To reach this goal, we define “challenge tasks” that are exciting to work on, that drive our development towards general purpose robot capabilities, and that allow for rigorous quantitative testing.

Autonomous order fulfillment in grocery stores is a particularly good way to drive our development of mobile manipulation capabilities because it encompasses a host of difficult challenges for robots, including perceiving and manipulating a large variety of objects, navigating an ever-changing environment, and reacting to unexpected circumstances. A single shopping run can contain a long list of items, so this task requires the system to be reliable and encourages a focus on overall execution speed. We use several intuitive metrics to measure progress. How many items did the robot correctly retrieve? How many did it mistakenly grab? How long did it take? Best of all, we are able to recreate representative shopping aisles right inside our robotics labs, allowing us to iterate quickly in between tests at real grocery stores.

Today, we are providing insight on the work that goes into enabling our robots to do autonomous grocery shopping. We reveal the current iteration of our mobile manipulation robot platform and highlight some of the key technologies and techniques that we have developed. These developments can be used for many more purposes than grocery shopping.

One use case in particular that motivates us is helping older people with physical tasks. We envision that our developments will lead to advanced robots that can autonomously support the elderly in their homes and enable them to live more fulfilling lives, without supplanting their decision-making altogether[1].

Our robot platform

We are continuously improving our dual-arm mobile manipulation robots. Since building our first robot platform (described in [2]), we have developed a fully custom robot platform for advanced mobile manipulation. This robot is based on TRI’s custom robot actuators of different sizes, as well as the software ecosystem to support them. These actuators are the components that move every joint of the robot — they steer the wheels, move the 5 degrees-of-freedom (DOF) torso, move the two 7 DOF arms and control the neck. Our actuators encapsulate most of the complexity of the mechanical system and let us quickly iterate over different robot designs. They have a uniquely high torque density that enables us to build very slim arms that are powerful enough to carry even the heaviest everyday objects. The full robot is self-contained with a high-performance compute system and over 2kWh of fast-charging and hot-swappable battery capacity to support extensive testing without the need for a power tether. We can remove and reinsert battery modules into the robot while the system remains powered.

Cross-section: Cross-section of one of our custom joint actuators

One of our robots grasping an oil jug that weighs eight pounds

Mapping Aisles and Items in 3D

Our robots rely on a pre-generated map of the grocery store to find the items they are looking for. The map contains a detailed 3D geometric reconstruction of the store as well as the locations of a large number of items. To this end, we manually move a custom-built data collection cart through the store and log a stream of images captured by several stereo cameras. Based on this data, our system creates a map in two main steps. First, it performs a detailed 3D geometric reconstruction of the entire space. Next, it detects the objects in the captured images and compares them to a set of items stored in a database. If the system recognizes an object, it adds the object to the map at the correct location in space given by the 3D geometry computed in the first step. To successfully solve these challenging tasks, we developed a pipeline that combines the robustness of modern deep learning approaches with the precision of classical geometric algorithms.

Overhead view of a 3D map of a grocery store that we test our systems in

Detailed view of a 3D map of the grocery shelves in our lab, with coordinate frames at every mapped item location

Navigating the Real World

Equipped with a map of the grocery store, we test our system by generating a random shopping list. Then, we tell the robot to bring back as many items from the shopping list as possible. From here on, the robot is fully autonomous and untethered, running all required computations on board. It plans an efficient path to visit all the items on the shopping list and then starts driving to the first item on the path. As the layout of real grocery stores is constantly changing, the robot uses its stereo vision system [3,4] to navigate around special displays, wet-floor-signs, or other obstacles that might have appeared since the map was generated.

Object Detection and Classification in the Wild

Once the robot arrives at the mapped item location, it uses its stereo cameras to verify that the item is still there and to determine its exact position. A variety of conditions can arise, and our detection algorithms and robot behaviors need to robustly handle all of them. For instance, the item might be out of stock, or it might have been moved to a different shelf. Perhaps the packaging changes seasonally or is similar to that of a new item. Items may not be perfectly placed in view of the camera if they are rotated or placed on the top or bottom shelf, or placed far towards the back of the shelf. All of these variations are great test cases for our perception methods. One way we improve our chances of getting good object recognition is by running the same neural networks on data from two stereo camera pairs, one in the head and one in the mobile base of the robot. This helps tremendously with increasing coverage across the entire height of the grocery shelf.

Architecture of our item classification system

A grocery shelf from the robot’s points of view (it has two!)

Grasping a large Variety of Items

Once the robot successfully locates the item, it plans how to grasp it. Given the diversity of items (regarding weight, shape, size, and stiffness) represented in a grocery store, we equipped the robot with both a custom suction gripper and an off-the-shelf two-finger parallel gripper.

The robot uses an off-the-shelf parallel gripper and a custom suction tool for grasping a wide variety of items

During operation, the robot uses its stereo cameras to obtain a rich, 3D geometric representation of the item and infers properties like the item’s dimensions, pose, and surface curvature. We leverage a PointNet-based neural network model to determine the tool to use and type of grasp to utilize. The system uses the output of this network with the processed 3D geometric information to grasp the item. It might grip the cap of a bottle rather than the body or position the suction cup on a flat region of a jar, rather than a curve.

The robot automatically identifies the best spot on an item for a successful suction grasp

To quickly and successfully position the tool at the optimized grasp pose, we developed a highly-capable, custom motion planner. Our planner combines concepts from Dynamic Probabilistic Roadmaps while leveraging GPU-acceleration and custom inverse-kinematics solvers to quickly generate motion plans for our 29 DOF robot, even in tight grocery aisles. Because the placement of the tool is extremely important for grasp success, the robot verifies and corrects the relative position of the tool and the items as necessary, using an Iterative Closest Point algorithm before closing the gripper or turning on the suction pump. If the sensors in the tool signal that the object is successfully grasped, the robot places it in its shopping basket and moves on to the next item on its path. If not, it tries again.

A collection of picks from randomized shopping lists during field testing at a local grocery store (3x Speed)

Continuous progress

We have run multi-day field tests in a local grocery store in Mountain View, CA every three months over the last year. In these field tests, members of TRI’s Prototyping and Research Operations team (PROPS) send the robot shopping for several hours each night, collecting invaluable data that allow us to quantify our progress, learn from detailed failure analysis, and quickly test new ideas on real-world data. As a result, our robots are continually getting better and quicker at handling an increased variety of items.

TRI’s JC Hancock and her colleagues at TRI’s PROPS team conduct rigorous testing and data collection that enable rapid capabilities development

Grocery order fulfillment continues to challenge and inspire us to invent new approaches to difficult problems facing mobile robots, and we believe that it has already brought us closer to our vision of a practical and reliable robot companion that can improve quality of life. We have made breakthroughs in robust perception, manipulation and motion planning methods that advance the field of robotics in meaningful ways. We are excited to apply our techniques to other domains to keep innovating rapidly and maximize our impact.

You can learn more about our work in this CNET exclusive video. And If our goals and methods sound like something you would enjoy working on, consider joining our team!

References

[1] https://www.youtube.com/watch?v=6IGCIjp2bn4

[2] https://arxiv.org/abs/1910.00127

[3] https://medium.com/toyotaresearch/seeing-clearly-advances-in-robotic-vision-through-learning-stereo-depth-52e3c9440646

[4] https://arxiv.org/abs/2109.11644