Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation (AAMAS2023)

HikaruAsano
OMRON SINIC X
Published in
5 min readMay 26, 2023

We are excited to present our new work on fairness-aware multi-robot navigation at the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023)!

Hikaru Asano, Ryo Yonetani, Mai Nishimura & Tadashi Kozuno. “Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation.” AAMAS 2023 [arXiv] [Project]

In this work, we propose the fairness problem in the multi-robot navigation problem and develop an algorithm that balances the tradeoff between efficiency and fairness.

First, to define fairness in the navigation problem, we considered how much we deviated from the ideal path during the trip. Furthermore, in the complex navigation problem, it is extremely difficult to understand how one’s actions contributed to the fairness of neighboring robots. We proposed an algorithm that solves this credit-assignment problem using counterfactual hypothetical reasoning with hierarchical reinforcement learning to achieve fairness and efficiency.

Motivation

Efficiency and Fairness are very important concepts in the practical application of autonomous robots. For example, a situation in which most robots working in food delivery or cab services are efficient, but only certain robots experience significant delays is problematic. However, controlling all robots to arrive at their destinations in the same amount of time is also a problem. This is because robots far from the starting point to the destination can work efficiently, but robots close to the destination will have to wait, even though they could have arrived earlier.

Thus, while the issue of fairness in multi-robot navigation is important, the definition of fairness is a non-trivial issue.

Fairness in Multi-robot Navigation Problem

In navigation, fairness must be defined such that the goal location of each robot is taken into account. In navigation, fairness must be defined such that the goal location of each robot is taken into account. In other words, we need to consider an index that makes an advantage or disadvantage difficult depending on initial conditions such as starting and goal positions. Therefore, we cannot handle efficiency well without considering fairness based on arrival time. Therefore, we cannot handle efficiency well without considering fairness based on arrival time.

In our study, to better handle efficiency, we defined fairness by focusing on how much a robot lost and equalizing the amount of loss.

In particular, we defined fairness by equalizing the difference between the best possible arrival time that a robot would achieve if it ignored the presence of other robots and the actual arrival time that would occur if it stopped or detoured for safety reasons, which we define as Delay.

Navigation with Counterfactual Fairness Filter

We have developed a Navigation with Counterfactual Fairness Filter (NCF2) to achieve fairness and efficiency. This algorithm solves the following difficulties in efficiency-aware multi-agent navigation.

  1. Credit-assignment problem
    In a multi-agent environment, fairness and efficiency depend on the actions of other agents and their own actions. In such an environment, it is difficult to quantify the contribution of one’s own actions.
  2. Complex Dynamics
    Navigation is a continuous action space, and various action options exist. In this environment, learning behaviors that are both fair and efficient is a challenging problem.
  3. Cooperation with safety in mind
    To achieve fairness, we need to have more aggressive behavior for robots with higher delay. However, achieving this in a safety-critical environment is a very difficult task.

Algorithm

overview of NCF2

NCF2 is a decentralized, counterfactual algorithm designed for fair-delay multi-robot navigation tasks. The key idea is to equip each agent with a policy comprising a navigation module and a counterfactual fairness filter (CF2) module. The navigation module enables agents to take continuous actions to reach their destinations efficiently and safely. At the same time, the CF2 module is learned via counterfactual inference to decide if the agents can advance toward their goal or should stay still to let others go to improve the trade-off between fairness and efficiency.

To learn the CF2 module, we employ the following steps.

  1. The navigation module computes the cooperative actions, considering all neighboring agents’ actions, which results in a conservative action.
  2. CF2 module decides whether the agent should advance toward its goal or should stay still to let other agents go and pass and send messages to the agent who will stay still or not to neighboring agents. The orange agent’s CF2 module decides to stop and try to make space for the blue agent.
  3. The navigation module recalculates its action, taking into account the message from the CF2 module. If the CF2 module of a neighboring agent decides to stop, the agent can ignore this neighbor’s next action, allowing it to take a more aggressive approach. The blue agent can ignore the orange agent’s action so that the blue agent can move forward.

Through these steps, the CF2 module can calculate how much its own stopping action contributes to the actions of neighboring agents, thereby effectively solving complex credit assignment problems.

Experiment

To test the effectiveness of our proposed method, we experimented with existing fairness-aware multi-agent reinforcement learning algorithms, Fair-Efficient Network (FEN), and Self-Oriented Team-Oriented Network (SOTO), for comparison.

The results of the experiment are shown above. In an environment with few robots (above case), FEN and SOTO successfully learned but did not outperform our proposed method in the measures of success rate, makespan, and variance of delay. Furthermore, in more difficult environments, FEN, and SOTO failed to learn, with low success rates or no success at all.

In contrast, our proposed method achieves both efficiency (makespan) and fairness (variance of delays) and, moreover, can learn behaviors that contribute to fairness, even in difficult environments.

Call for Interns

This project was done as a project in the Perception Group at OMRON SINIC X. We and our other groups (Robotics Group and Interaction Group) are looking for talented students for our internships. The first author, H.Asano, was also our intern.

Check below if you are interested in our internship opportunities.

--

--