Why we make our robots shop for groceries
How challenge tasks drive the development of our mobile manipulation systems
By The TRI Mobile Manipulation Team, including James Borders, Richard Cheng, Dan Helmick, Lukas Kaul, Dan Kruse, John Leichty, Carolyn Matl, Chavdar Papazov, Mark Tjersland
At TRI, we are developing robotic capabilities with the goal of improving the quality of everyday life for all. To reach this goal, we define “challenge tasks” that are exciting to work on, that drive our development towards general purpose robot capabilities, and that allow for rigorous quantitative testing.
Autonomous order fulfillment in grocery stores is a particularly good way to drive our development of mobile manipulation capabilities because it encompasses a host of difficult challenges for robots, including perceiving and manipulating a large variety of objects, navigating an ever-changing environment, and reacting to unexpected circumstances. A single shopping run can contain a long list of items, so this task requires the system to be reliable and encourages a focus on overall execution speed. We use several intuitive metrics to measure progress. How many items did the robot correctly retrieve? How many did it mistakenly grab? How long did it take? Best of all, we are able to recreate representative shopping aisles right inside our robotics labs, allowing us to iterate quickly in between tests at real grocery stores.
Today, we are providing insight on the work that goes into enabling our robots to do autonomous grocery shopping. We reveal the current iteration of our mobile manipulation robot platform and highlight some of the key technologies and techniques that we have developed. These developments can be used for many more purposes than grocery shopping.
One use case in particular that motivates us is helping older people with physical tasks. We envision that our developments will lead to advanced robots that can autonomously support the elderly in their homes and enable them to live more fulfilling lives, without supplanting their decision-making altogether[1].
Our robot platform
We are continuously improving our dual-arm mobile manipulation robots. Since building our first robot platform (described in [2]), we have developed a fully custom robot platform for advanced mobile manipulation. This robot is based on TRI’s custom robot actuators of different sizes, as well as the software ecosystem to support them. These actuators are the components that move every joint of the robot — they steer the wheels, move the 5 degrees-of-freedom (DOF) torso, move the two 7 DOF arms and control the neck. Our actuators encapsulate most of the complexity of the mechanical system and let us quickly iterate over different robot designs. They have a uniquely high torque density that enables us to build very slim arms that are powerful enough to carry even the heaviest everyday objects. The full robot is self-contained with a high-performance compute system and over 2kWh of fast-charging and hot-swappable battery capacity to support extensive testing without the need for a power tether. We can remove and reinsert battery modules into the robot while the system remains powered.
Mapping Aisles and Items in 3D
Our robots rely on a pre-generated map of the grocery store to find the items they are looking for. The map contains a detailed 3D geometric reconstruction of the store as well as the locations of a large number of items. To this end, we manually move a custom-built data collection cart through the store and log a stream of images captured by several stereo cameras. Based on this data, our system creates a map in two main steps. First, it performs a detailed 3D geometric reconstruction of the entire space. Next, it detects the objects in the captured images and compares them to a set of items stored in a database. If the system recognizes an object, it adds the object to the map at the correct location in space given by the 3D geometry computed in the first step. To successfully solve these challenging tasks, we developed a pipeline that combines the robustness of modern deep learning approaches with the precision of classical geometric algorithms.
Navigating the Real World
Equipped with a map of the grocery store, we test our system by generating a random shopping list. Then, we tell the robot to bring back as many items from the shopping list as possible. From here on, the robot is fully autonomous and untethered, running all required computations on board. It plans an efficient path to visit all the items on the shopping list and then starts driving to the first item on the path. As the layout of real grocery stores is constantly changing, the robot uses its stereo vision system [3,4] to navigate around special displays, wet-floor-signs, or other obstacles that might have appeared since the map was generated.
Object Detection and Classification in the Wild
Once the robot arrives at the mapped item location, it uses its stereo cameras to verify that the item is still there and to determine its exact position. A variety of conditions can arise, and our detection algorithms and robot behaviors need to robustly handle all of them. For instance, the item might be out of stock, or it might have been moved to a different shelf. Perhaps the packaging changes seasonally or is similar to that of a new item. Items may not be perfectly placed in view of the camera if they are rotated or placed on the top or bottom shelf, or placed far towards the back of the shelf. All of these variations are great test cases for our perception methods. One way we improve our chances of getting good object recognition is by running the same neural networks on data from two stereo camera pairs, one in the head and one in the mobile base of the robot. This helps tremendously with increasing coverage across the entire height of the grocery shelf.
Grasping a large Variety of Items
Once the robot successfully locates the item, it plans how to grasp it. Given the diversity of items (regarding weight, shape, size, and stiffness) represented in a grocery store, we equipped the robot with both a custom suction gripper and an off-the-shelf two-finger parallel gripper.
During operation, the robot uses its stereo cameras to obtain a rich, 3D geometric representation of the item and infers properties like the item’s dimensions, pose, and surface curvature. We leverage a PointNet-based neural network model to determine the tool to use and type of grasp to utilize. The system uses the output of this network with the processed 3D geometric information to grasp the item. It might grip the cap of a bottle rather than the body or position the suction cup on a flat region of a jar, rather than a curve.
To quickly and successfully position the tool at the optimized grasp pose, we developed a highly-capable, custom motion planner. Our planner combines concepts from Dynamic Probabilistic Roadmaps while leveraging GPU-acceleration and custom inverse-kinematics solvers to quickly generate motion plans for our 29 DOF robot, even in tight grocery aisles. Because the placement of the tool is extremely important for grasp success, the robot verifies and corrects the relative position of the tool and the items as necessary, using an Iterative Closest Point algorithm before closing the gripper or turning on the suction pump. If the sensors in the tool signal that the object is successfully grasped, the robot places it in its shopping basket and moves on to the next item on its path. If not, it tries again.
Continuous progress
We have run multi-day field tests in a local grocery store in Mountain View, CA every three months over the last year. In these field tests, members of TRI’s Prototyping and Research Operations team (PROPS) send the robot shopping for several hours each night, collecting invaluable data that allow us to quantify our progress, learn from detailed failure analysis, and quickly test new ideas on real-world data. As a result, our robots are continually getting better and quicker at handling an increased variety of items.
Grocery order fulfillment continues to challenge and inspire us to invent new approaches to difficult problems facing mobile robots, and we believe that it has already brought us closer to our vision of a practical and reliable robot companion that can improve quality of life. We have made breakthroughs in robust perception, manipulation and motion planning methods that advance the field of robotics in meaningful ways. We are excited to apply our techniques to other domains to keep innovating rapidly and maximize our impact.
You can learn more about our work in this CNET exclusive video. And If our goals and methods sound like something you would enjoy working on, consider joining our team!