# CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces (AAMAS 2022)

We are excited to announce that our new work on data-driven path planning for multiple agents has been accepted to the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022).

Keisuke Okumura, Ryo Yonetani, Mai Nishimura & Asako Kanezaki.

“CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces.”AAMAS 2022 (In Press). [arXiv] [Project] [Code]

The AAMAS has been recognized as the leading scientific conference for research in the areas of artificial intelligence, autonomous agents, and multi-agent systems. This work was done collaboratively with Keisuke Okumura (first author, intern from Tokyo Institute of Technology) and our technical advisor, Prof. Asako Kanezaki.

In this work, we address **multi-agent path planning (MAPP) in continuous spaces**. Especially, we are interested in constructing **roadmaps**, which are graphs approximating the spaces to apply path planning algorithms. Our approach learns a generative model from planning demonstrations to build **cooperative timed roadmaps (CTRMs)**, a newly developed concept to derive collision-free paths efficiently. We demonstrate that CTRMs significantly reduce the planning effort with acceptable overheads.

We will explain our study below.

## Table of Contents

- Motivation: Multi-Agent Path Planning & Roadmaps
- CTRMs: Cooperative Timed Roadmaps
- Methodology to Construct CTRMs
- Evaluations
- Highlights
- Learn More about CTRMs

# Motivation: Multi-Agent Path Planning & Roadmaps

**Multi-agent path planning (MAPP)** in continuous spaces is a fundamental problem in multi-agent systems with attractive applications such as warehouse automation, coordination of self-driving cars, and drone shows, to name just a few.

A problem is defined by a set of start-goal locations for each agent (or robot) and static obstacles. The objective is to find a set of collision-free paths, which brings each agent to their respective destinations without colliding.

One possible approach to MAPP consists of two steps*:

- approximating the spaces by constructing graphs called roadmaps, and then
- using pathfinding algorithms for multiple agents on those roadmaps to derive a set of collision-free paths.

While such approaches based on roadmaps have widely been used for single-agent planning, **doing the same for multiple agents is not trivial at all**.

This is due to the necessity of constructing sparse roadmaps. If not so, it will be dramatically hard to find a combination of plausible paths by managing a higher number of inter-agent collisions. Nevertheless, there is generally a **trade-off between roadmap density and solution quality**; roadmaps should be sufficiently dense to ensure high planning success rate and better solutions.

This begs our key questions:

What are the characteristics to consider in order to effectively construct roadmaps for MAPP?

*Note: Roadmaps are typically constructed via

iteratively sampling pointsfrom the space. Such a kind of approach is calledsampling-based motion planning. The second phase is known as multi-agent pathfinding (MAPF), where significant progress has been made in recent years.

# CTRMs: Cooperative Timed Roadmaps

To address the aforementioned issue, we propose a novel graph representation named **cooperative timed roadmaps (CTRMs)**, specialized for multi-agent scenarios. CTRMs consist of directed acyclic graphs constructed in the following fashion:

**Agent-specific**: each CTRM is specialized for individual agents to focus on their important locations.**Cooperative**: each CTRM is also aware of the behaviors of other agents so as to make it easy for the subsequent pathfinding algorithms to find collision-free paths.**Timed**: each vertex in CTRMs is augmented in time direction to represent not only “where,” as commonly doing so in conventional roadmaps, but also “when,” because solutions of MAPP are a set of “timed” paths.

By taking these properties together, **CTRMs aim to provide a small search space that still contains plausible solutions for pathfinding algorithms**.

# Methodology to Construct CTRMs

To construct CTRMs, we develop a **machine learning (ML)-based approach**:

- Suppose that we are given a collection of MAPP demonstrations (a pair of problem instances and plausible solution paths) obtained from offline and intensive computation with conventional roadmaps. For instance, such roadmaps can be obtained with uniform random sampling.
- The proposed ML model learns from how agents behave cooperatively at each unit time (i.e., timestep), which is implicitly included in the demonstrations, to predict how they move in the next timesteps.
- For a new, previously unseen problem instance, the learned model can then be used to sample a small set of agent-specific vertices (i.e., space-time pairs) for generating multiple solution path candidates, and eventually, constructing CTRMs by compositing those candidates.

As a result, the CTRMs can serve a smaller but promising search space and enable the pathfinding algorithms to derive a solution more efficiently than when using conventional roadmaps. Technically, we extend a class of generative models called a conditional variational autoencoder (CVAE) [Sohn et al., NeurIPS-15] to learn a conditional probability distribution of vertices of CTRMs for each agent, given the observations of multiple agents. **Our model can work with an arbitrary size of space, an arbitrary number of agents, and even with a heterogeneous setting**.

In what follows, we explain (1) offline training: learning from MAPP demonstrations and (2) online inference: constructing CTRMs with the learned model.

## 1. Offline Training: Learning from MAPP Demonstrations

In this work, **we cast roadmap construction as a machine learning (ML) problem**, described as follows.

*Training Data*: Collection of MAPP demonstrations (problem instances and their solutions).*Objective*: Given new configurations, predict the next locations for each agent.

The training data can be obtained by offline intensive computation with sufficiently dense roadmaps constructed with uniform random or grid sampling. As an ML model, we used CVAE because it can learn the conditional probability distribution of next locations *y* (i.e., vertex in roadmaps) given the current observation *x*, i.e., *p(y|x)*. CVAE has been a popular choice for ML-based motion planning studies, e.g., [Ichter et al., ICRA-18]. **The trained model then be used as a “vertex sampler”** to construct CTRMs that are effective for solving a new, previously unseen problem instance.

The problem is now **extracting informative features** to form observation *x*. An inherent challenge in MAPP is that **each agent needs to move toward its goal while avoiding collisions with other agents** that are also in motion. For this purpose, our method uses the following features:

*Goal-driven features (x_goal)*: relative positions to goal and previous location, spatial sizes of agents, kinematic constraints, as well as surrounding configuration embodied as field-of-view (FOV).*Communication features (x_comm)*: other nearby agents’ information, accumulated by attention architecture for multi-agent interaction modeling [Hoshen, 2017].*Indicator feature (x_ind)*: high-level choices of which direction an agent will move (such as left, straight, and right).

Finally, we concatenate all features to form *x*. Note that the neural networks used for the feature extraction can be trained end-to-end with the CVAE.

## 2. Online Inference: Constructing CTRMs with Learned Model

The next step is to construct CTRMs for a new instance. The concept of the proposed algorithm is described below.

- Generate a sequence of vertices (i.e., path) by iteratively predicting the next locations for each agent.
- Repeat the above path generation multiple times.
- Composite these generated paths to obtain CTRMs.

From the nature of the training, **each sample obtained from the trained model is expected to have a nice property, that is, the sample brings each agent to its goal while avoiding inter-agent collisions**. In the first step, random walks are also mixed with certain probabilities, which contributes to improving the expressiveness of CTRMs beyond what has been learned from the training data.

# Evaluations

We extensively evaluate our proposed approach on a variety of MAPP problems with several different setups in terms of the number of agents (21–40), the presence of obstacles, and the heterogeneity in agent sizes and motion speeds. Our results demonstrate that, compared to other standard roadmap construction strategies, **planning by learning to construct CTRMs is order-of-magnitude efficient in the planning effort** (e.g., assessed by search node expansions or runtime), while maintaining comparable planning success rate and solution quality, with acceptable overheads.

Herein, we introduce part of our results. For details, please check our paper. An example of a problem instance is shown at the top of this article (gif image).

## Visualization of Constructed roadmaps for One Agent

As you can see below, CTRMs produce small but effective roadmaps specific to each agent, compared to standard roadmap construction methods.

Baselines for roadmap construction methods were:

**Random**sampling samples agent locations uniformly at random from the space. This is equivalent to a simplified version of probabilistic roadmaps [Karaman & Fraoli, IJRR-11].**Grid**sampling, as used in conventional multi-agent pathfinding studies [Stern et al., SOCS-19].**SPARS**, an algorithm for roadmap construction that attempts to reduce both vertices and edges, developed for single-agent motion planning [Dobson & Bekris, IJRR-14].**Square**sampling, a variant of random sampling focusing on the square region.

## Performance on Basic Scenario

Since the objective of MAPP is to plan conflict-free paths, the effectiveness of roadmap construction methods should be evaluated via the subsequent planning results. By using standard prioritized planning (e.g., [Silver, AIIDE-05], [Van Den Berg & Overmars, IROS-05]) as a reasonable choice of the planner, we computed the following metrics. Note that other planning algorithms can also be applicable to CTRMs such as conflict-based search [Sharon et al., AIJ-15].

The main observations are summarized below:

- CTRMs contain solutions in a small search space.
- While keeping solution quality, CTRMs contribute to reducing the planning effort by orders of magnitude compared to the alternatives.
**CTRMs achieve efficient MAPP solving from the end-to-end perspective**.

# Highlights

- Cooperative timed roadmaps (CTRMs), a novel graph representation tailored to multi-agent path planning in continuous spaces.
- CTRMs provide a smaller but promising search space for planning algorithms, which significantly reduces planning effort.
- Data-driven approach to construct CTRMs, which learns from MAPP demonstrations.

# Learn More about CTRMs

- The project page provides links to the supplementary material, the code repository, the presentation, etc.
- The authors are welcome to questions, discussions, and new collaboration.

# Call for Interns

This project was done as a project in the Perception Group at OMRON SINIC X. We and our other groups (Robotics Group and Interaction Group) are looking for talented students for our internships. The first author, K. Okumura, was also our intern.

Check below if you are interested in our internship opportunities.