Learning to Balance the Safety-Efficiency Trade-offs in Interactive Crowd-aware Robot Navigation

Published in

OMRON SINIC X

3 min readJul 8, 2020

Our project on interactive navigation in highly congested environments has been accepted to the International Conference on Intelligent Robot and Systems (IROS) 2020. 🌈

A robot navigating towards goal in highly crowded environments

Mai Nishimura and Ryo Yonetani, L2B: Learning to Balance the Safety-Efficiency Trade-off in Interactive Crowd-aware Robot Navigation, [arXiv][project page]

Overview

Imagine a future mobile robot system that can navigate crowded places, such as busy shopping malls and airports, as naturally as we do. Developing such an intelligent navigation system would enhance several practical applications including automated delivery services and guidance at airports. To achieve this goal, we present a deep reinforcement learning (RL) framework for crowd-aware navigation, which enables agents to interact with a crowd not only by finding a bypass safely but also by actively clearing a path to arrive at their destinations efficiently.

Active Path Clearing

Recently, deep RL approaches have achieved promising results in congested scenarios by jointly performing path planning and collision avoidance. All of these prior works, however, an agent can easily got trapped into a “freezing robot problem” [1] , where the agent is struggling to find a bypass and as a result, takes unnecessary maneuvers (Fig.1).

Fig.1* A robot agent (orange) trained with a collision-avoidance only policy.

To address this problem, we develop a deep reinforcmenet learning framework called Learning to Balance (L2B). The proposed L2B framework enables crowd-aware navigation agents to learn a hybrid policy choosing to either 1) passively avoid potential collisions with a crowd or 2) actively address nearby persons, e.g., by emitting a beeping sound. Contrary to conventional collision-avoidance only approaches, our agents can reach a goal stably and efficiently (Fig.2).

Fig.2* Agent with a balanced path clearing and collision avoidance policy. When the robot executes a path clearing action, all the pedestrians within the yellow circle are affected to give way to the robot.

Learning to Balance Safety-Efficiency Trade-offs

We observe that the safety and efficiency requirements in crowd-aware navigation have a trade-off in the presence of social dilemmas [2] between the agent and the crowd. On the one hand, intervening in pedestrian paths too much to achieve instant efficiency will result in collapsing a natural crowd flow and may eventually put everyone, including the self, at risk of collisions. On the other hand, keeping in silence to avoid every single collision will lead to the agent’s inefficient travel.

With this observation, our L2B (Learning to Balance) framework augments the reward function used in learning an interactive navigation policy to penalize frequent active path clearing and passive collision avoidance, which substantially improves the balance of the safety-efficiency trade-off. As shown in Figure 1 and Figure 2, we evaluated our L2B framework in a challenging crowd simulation and demonstrated its superiority, in terms of both navigation success and collision rate, over a state-of-the-art navigation approach.

Future Work

One possible direction for future work is to formulate this crowd-aware navigation task in a multi-agent RL problem, where each pedestrian in a crowd also allowed to improve its policy to better cooperate with the robot. Such a direction is beneficial for practical robotics applications such as swarm robotics and multiple vehicle control.

For more details, please come and visit our presentation at IROS 2020 (October 25–29,2020) !

References

[1] Trautman, Pete, et al. “Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation.” The International Journal of Robotics Research 34.3 (2015): 335–356.
[2] Leibo, Joel Z., et al. “Multi-agent Reinforcement Learning in Sequential Social Dilemmas.” Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. 2017.
[*] Our demo results utilized a pedestrian model by mixamo.com and a robot model by Tomás Laulhé, modifications by Don McCurdy. CC0 1.0.

Post based on:
Mai Nishimura and Ryo Yonetani, L2B: Learning to Balance the Safety-Efficiency Trade-off in Interactive Crowd-aware Robot Navigation, In Proc. IROS,2020.[arXiv][project page]
Relevant Project:
Hiroaki Minoura, Ryo Yonetani, Mai Nishimura, and Yoshitaka Ushiku, Crowd Density Forecasting by Modeling Patch-based Dynamics [arXiv]