MULTIPOLAR: Multi-Source Policy Aggregation for Transfer RL between Diverse Environmental Dynamics (IJCAI’20)

Ryo Yonetani
OMRON SINIC X
Published in
3 min readMay 8, 2020

We are thrilled to share that our project on transfer RL has been accepted to the International Joint Conference on Artificial Intelligence (IJCAI) 2020! Our talk will appear in the main track [Machine Learning] Reinforcement Learning 1/3 scheduled from 9:00 am to 10:20 am (GMT) on Jan. 13.

Mohammadamin Barekatain, Ryo Yonetani, and Masashi Hamaya, “MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics”, accepted to IJCAI’20 [Project] [arXiv preprint]

This project was done while the first author, Amin from the Technical University of Munich, was doing an internship at OMRON SINIC X.

Overview

We envision a future scenario where a variety of robotic systems, which are each trained or manually engineered to solve a similar task, provide their policies for a new robot to learn a relevant task quickly. For example, imagine various pick-and-place robots working in factories all over the world. Depending on the manufacturer, these robots will differ in their kinematics (e.g, link length, joint orientation) and dynamics (e.g, link mass, joint damping, friction, inertia). They could provide their policies to a new robot, even though their dynamics factors, on which the policies are implicitly conditioned, are not typically available. Moreover, we cannot rely on a history of their individual experiences, as they may be unavailable due to a lack of communication between factories or prohibitively large dataset sizes.

In such scenarios, a key technique to develop would be the ability to transfer knowledge from a collection of robots to a new robot quickly only by exploiting their policies, while being agnostic to their different kinematics and dynamics. In other words, we aim to explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target policy efficiently. The unavailability of the information about source environmental dynamics prevents us from adopting existing work on transfer RL between different dynamics, as they require access to source environment instances or their dynamics.

Overview of MULTIPOLAR

As a solution to the problem, we propose a new transfer RL approach named MULTI-source POLicy AggRegation (MULTIPOLAR). As shown in the figure above, our key idea is twofold;

  1. In a target policy, we adaptively aggregate the deterministic actions produced by a collection of source policies. By learning aggregation parameters to maximize the expected return at a target environment instance, we can better adapt the aggregated actions to unseen environmental dynamics of the target instance without knowing source environmental dynamics.
  2. We also train an auxiliary network that predicts a residual around the aggregated actions, which is crucial for ensuring the expressiveness of the target policy even when some source policies are not useful.

As another advantage, MULTIPOLAR can be used for both continuous and discrete action spaces with few modifications while allowing a target policy to be trained in a principled fashion.

We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations.

Examples of source and target agents with different dynamics and kinematics
We confirmed the sample efficiency of MULTIPOLAR across six simulated environments under both continuous and discrete action spaces.

A video summary that has been presented at the ICLR 2020 workshop is also available here.

--

--

Ryo Yonetani
OMRON SINIC X

Research Scientist at CyberAgent AI Lab. Ex-Principal Investigator at OMRON SINIC X, Japan