TRI at ICRA 2023

Toyota Research Institute
Toyota Research Institute
9 min readMay 25, 2023

The International Conference on Robotics and Automation (ICRA) is one of the top international venues in Robotics. ICRA 2023 will be a hybrid conference, with both in-person and virtual attendance options, and it will take place on May 29 to June 2 in London, England.

This year, Toyota Research Institute (TRI) is once again a Silver Sponsor and will be presenting new research findings and participating in two workshops, with an award finalist for a paper on Autonomous Drifting. Check out the main conference and workshops below to learn about where TRI researchers will be present. We look forward to seeing and talking to you online and in person at this year’s ICRA — you can find us at booth #G10!

Note: Abstracts are pulled from papers and not all authors are TRI employees.

Talks

Robots for Society

Date and Time: May 30th, 2023 13:00–14:30 BST

John Leonard, Technical Advisor to TRI, will give a keynote titled “Towards Human-Centered Embodied Intelligence” during the first keynote session.

Abstract:

There are of course countless ways in which robots could help humans, and this keynote session, Robots for Society, examines some of the issues involved and how recent progress has expanded the breadth of possibilities. Advances in the development of intelligent algorithms now allow us to operate robots in environments populated by humans, while mitigating the risks involved in doing so. Robots are being designed to help with household tasks such as vacuum cleaning and gardening, while smart cars are advancing at a tremendous pace to support human mobility. On the factory floor, co-bots are becoming more widely deployed, and seamless human-robot cooperation is increasingly being worked into manufacturing systems. Key to this is the fact that today’s robots can operate reliably and safely in and around humans. This keynote session will delve into the role of robots in society and the myriad of ways in which they are making our lives easier by performing tasks even more efficiently than we can.

Workshops

Workshop on Scalable Autonomous Driving

Date: June 2nd, 2023

Location: ICC Capital Suite 11

Website: https://sites.google.com/view/icra2023av/home

Adrien Gaidon, Director in the Machine Learning Division at TRI, will give a talk titled “Geometric Foundation Models.”

Workshop on Robot Execution Failures and Failure Management Strategies

Date: June 2nd, 2023

Location: South Gallery Room 24

Website: https://robot-failures.github.io/icra2023/

Masha Itkina, Research Scientist in the Machine Learning Division at TRI, will give a talk titled “ Interpretable Self-Aware Neural Networks for Robust Trajectory Prediction.”

Abstract: Although neural networks have seen tremendous success as predictive models in a variety of domains, they can be overly confident in their predictions on out-of-distribution (OOD) data. To be viable for safety-critical applications in human environments, like autonomous vehicles or assistive robotics, neural networks must accurately estimate their epistemic or model uncertainty, achieving a level of system self-awareness.

In this talk, I will present an approach based on evidential deep learning to estimate the epistemic uncertainty over a low-dimensional, interpretable latent space in a trajectory prediction setting. We introduce an interpretable paradigm for trajectory prediction that distributes the uncertainty among the semantic concepts: past agent behavior, road structure, and social context. We validate our approach on real-world autonomous driving data, demonstrating superior performance over state-of-the-art baselines. Looking to the future, by enabling uncertainty-aware spatiotemporal inference in robotic systems, I hope to engender safe and socially cohesive human-robot interactions.

Workshop on Compliant Robot Manipulation: Challenges and New Opportunities

Date: June 2nd, 2023

Location: South Gallery Room 18

Website: https://sites.google.com/yale.edu/icra2023-compliantmanipulation/

Naveen Kuppuswamy and Eric Cousineau, Research Scientists on the Manipulation team, will be giving a talk titled “May the Force be with You: Towards Compliant and Contact-Aware Visuomotor Policies.”

Main Conference

Paper Award Finalist: “Autonomous Drifting with 3 Minutes of Data via Learned Tire Models”

Authors: Franck Djeumou, Jon Goh, Ufuk Topcu, Avinash Balachandran

Details: Tuesday, May 30th, 8:30–10:10 BST, Poster Session, Room T8 with an additional Talk on Wednesday, May 31st, 15:30–15:40 BST in the Auditorium

Abstract: Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural- parameterization. These models are designed to satisfy physically insightful assumptions while also having sufficient fidelity to capture higher-order effects directly from vehicle state measurements. They are used as drop-in replacements for an analytical brush tire model in an existing nonlinear model predictive control framework. Experiments with a customized Toyota Supra show that scarce amounts of driving data — less than three minutes — is sufficient to achieve high-performance autonomous drifting on various trajectories with speeds up to 45mph. Comparisons with the benchmark model show a 4 times improvement in tracking performance, smoother control inputs, and faster and more consistent computation time.

Paper: “SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception”

Authors: Kaushik Shivakumar, Vainavi Viswanath, Anrui Gu, Yahav Avigal, Justin Kerr, Jeffrey Ichnowski, Richard Cheng, Thomas Kollar, Ken Goldberg

Details: Tuesday, May 30th, 15:00–16:40 BST, Poster Session, Room T8

Abstract: Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling. This paper extends prior work on autonomously untangling long cables by introducing novel uncertainty quantification metrics and actions that interact with the cable to reduce perception uncertainty. We present Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0), a system that autonomously untangles cables approximately 3 meters in length with a bilateral robot using estimates of uncertainty at each step to inform actions. By interactively reducing uncertainty, SGTM 2.0 significantly reduces run-time. Physical experiments with 84 trials suggest that SGTM 2.0 can achieve 83% untangling success on cables with 1 or 2 overhand and figure-8 knots, and 70% termination detection success across these configurations, outperforming SGTM 1.0 by 43% in untangling accuracy and 200% in completion time. Supplementary material, visualizations, and videos can be found at sites.google.com/view/sgtm2.

Paper: “Real-time Solutions to Multimodal Partially Observable Dynamic Games”

Authors: Oswin So, Paul Drews, Thomas Balch, Velin Dimitrov, Guy Rosman, Evangelos Theodorou

Details: Tuesday, May 30th, 15:00–16:40 BST, Poster Session, Room T8

Abstract: Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEnt Nash equilibria using linear-quadratic approximations which are restricted to unimodal responses and unsuitable for scenarios with multiple local Nash equilibria. By reformulating the problem as a POMDP, we propose MPOGames, a method for efficiently solving MaxEnt dynamic games that captures the interactions between local Nash equilibria. We show the importance of uncertainty-aware game theoretic methods via a two-agent merge case study. Finally, we prove the real-time capabilities of our approach with hardware experiments on a 1/10th scale car platform.

Paper: “Simple-BEV: What Really Matters for Multi-Sensor BEV Perception?”

Authors: Adam W. Harley, Zhaoyuan Fang, Jie Li, Rares Ambrus, Katerina Fragkiadaki

Details: Tuesday, May 30th, 15:00–16:40 BST, Poster Session, Room T8

Abstract: Building 3D perception systems for autonomous vehicles that do not rely on high-density LiDAR is a critical research problem because of the expense of LiDAR systems compared to cameras and other sensors. Recent research has developed a variety of camera-only methods, where features are differentiably “lifted” from the multi-camera images onto the 2D ground plane, yielding a “bird’s eye view” (BEV) feature representation of the 3D space around the vehicle. This line of work has produced a variety of novel “lifting” methods, but we observe that other details in the training setups have shifted at the same time, making it unclear what really matters in top-performing methods. We also observe that using cameras alone is not a real-world constraint, considering that additional sensors like radar have been integrated into real vehicles for years already. In this paper, we first of all attempt to elucidate the high-impact factors in the design and training protocol of BEV perception models. We find that batch size and input resolution greatly affect performance while lifting strategies have a more modest effect — even a simple parameter-free lifter works well. Second, we demonstrate that radar data can provide a substantial boost to performance, helping to close the gap between camera-only and LiDAR-enabled systems. We analyze the radar usage details that lead to good performance and invite the community to reconsider this commonly-neglected part of the sensor platform.

Paper: “AutoBag: Learning to Open Plastic Bags and Insert Objects”

Authors: Lawrence Yunliang Chen, Baiyu Shi, Daniel Seita, Richard Cheng, Thomas Kollar, David Held, Ken Goldberg

Details: Wednesday, May 31st, 9:00–10:40 BST, Poster Session, Room T8

Abstract: Thin plastic bags are ubiquitous in retail stores, healthcare, food handling, recycling, homes, and school lunchrooms. They are challenging both for perception (due to specularities and occlusions) and for manipulation (due to the dynamics of their 3D deformable structure). We formulate the task of “bagging:” manipulating common plastic shopping bags with two handles from an unstructured initial state to an open state where at least one solid object can be inserted into the bag and lifted for transport. We propose a self-supervised learning framework where a dual-arm robot learns to recognize the handles and rim of plastic bags using UV-fluorescent markings; at execution time, the robot does not use UV markings or UV light. We propose the AutoBag algorithm, where the robot uses the learned perception model to open a plastic bag through iterative manipulation. We present novel metrics to evaluate the quality of a bag state and new motion primitives for reorienting and opening bags based on visual observations. In physical experiments, a YuMi robot using AutoBag is able to open bags and achieve a success rate of 16/30 for inserting at least one item across a variety of initial bag configurations. Supplementary material is available at https://sites.google.com/view/autobag.

Paper: “Cloth Funnels: Canonicalized-Alignment for Multi-Purpose Garment Manipulation”

Authors: Alper Canberk, Cheng Chi, Huy Ha, Benjamin Burchfiel, Eric Cousineau, Siyuan Feng, and Shuran Song

Details: Wednesday, May 31st, 9:00–10:40 BST, Poster Session, Room T8

Abstract: Automating garment manipulation is challenging due to extremely high variability in object configurations. To reduce this intrinsic variation, we introduce the task of “canonicalized-alignment” that simplifies downstream applications by reducing the possible garment configurations. This task can be considered as a “cloth state funnel” that manipulates arbitrarily configured clothing items into a predefined deformable configuration (i.e., canonicalization) at an appropriate rigid pose (i.e., alignment). In the end, the cloth items will result in a compact set of structured and highly visible configurations — which are desirable for downstream manipulation skills. To enable this task, we propose a novel canonicalized-alignment objective that effectively guides learning to avoid adverse local minima during learning. Using this objective, we learn a multi-arm, multi-primitive policy that strategically chooses between dynamic flings and quasi-static pick-and-place actions to achieve efficient canonicalized-alignment. We evaluate this approach on a real-world ironing and folding system that relies on this learned policy as the common first step. Empirically, we demonstrate that our task-agnostic canonicalized-alignment can enable even simple manually-designed policies to work well where they were previously inadequate, thus bridging the gap between automated non-deformable manufacturing and deformable manipulation.

Paper: “Depth Is All You Need for Monocular 3D Detection”

Authors: Dennis Park, Jie Li, Dian Chen, Vitor Guizilini, Adrien Gaidon

Details: Wednesday, May 31st, 15:00–16:40 BST, Poster Session, Room T8

Abstract: A key contributor to recent progress in 3D detection from single images is monocular depth estimation. Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features. More recent works leverage depth prediction as a pretraining task and fine-tune the depth representation while training it for 3D detection. However, the adaptation is limited in scale by manual labels. In this work, we propose further aligning the depth representation with the target domain in an unsupervised fashion. Our methods leverage commonly available LiDAR or RGB videos during training time to fine-tune the depth representation, which leads to improved 3D detectors. Especially when using RGB videos, we show that our two-stage training by first generating depth pseudo-labels is critical, because of the inconsistency in loss distribution between the two tasks. With either type of reference data, our multi-task learning approach improves over the state of the art on both KITTI and NuScenes, while matching the test-time complexity of its single-task sub-network. Source code and pre-trained models are available on https://github.com/TRI-ML/DD3D.

--

--

Toyota Research Institute
Toyota Research Institute

Applied and forward-looking research to create a new world of mobility that's safe, reliable, accessible and pervasive.