Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey

Published in

The Startup

6 min readNov 1, 2020

Deep Reinforcement Learning is an effective way to train robots to adapt to the real-world as it overcomes the problem of data source sample inefficiency and the cost of collection. It provides a potentially infinite source of data as the agent explores the environment and exploits the knowledge learned from its exploration. However, there is a remarkable degradation in performance observed in transitioning from a simulated environment to the real world. This warrants a deeper look into the efficient policy transfer methods which closes the gap between the simulation and the real world.

This survey article summarizes sim-to-real transfer fundamentals and gives an overview of the main methods applied in this area: domain randomization, domain adaptation, imitation learning, meta-learning, and knowledge distillation. It also highlights some of the most recent works and their application scenarios. In the end, we look at the challenges and areas of future research in the domain.

Introduction

In this article, we will study various methods for efficiently transferring knowledge learned during simulations to real-world robotic environments.

There is wide-ranging applications of Deep Reinforcement Learning (DRL) yet their success in the target environment remains an area of research.

Simulation-based learning provides a cost-effective way of sourcing the data through means of exploration but the differences between simulations and real-world scenarios pose challenges for the process of learning.

An agent working in the real world is exposed to experiences that were not accounted for in simulated environments and is expected to adapt its learning policies for a wide variety of tasks.

Some approaches depend on perturbances and randomization introduced in the environment while others try to bridge the gap between real-world and simulations through meta-learning and continual learning.

Simulators play an important role in sim-to-real transfer learning. They strive to provide training data and experiences to the agent so as to minimize the differences between the real-world and simulation.

Here is how sim-to-real learning relates to Transfer Learning, Reinforcement Learning, and Meta-Learning.

Deep Reinforcement Learning

In reinforcement learning, the agent moves from one state to another by taking an action in an environment in search of a maximum reward. This objective of maximizing the reward is represented by a neural network in Deep Reinforcement Learning (DRL). DRL has shown remarkable success in simulated environments but transferring this success to the real-world environment is still a challenge.

2. Sim-to-Real Transfer

Sim-to-Real Transfer, the focus of the article, in robotics must deal with 2 dimensions of robotics. First is the sensing part which relies on the raw sensor data from the robot’s sensors. This input data used for training during simulations should be as close to the real world and this requirement is common for most of the machine learning models to get better accuracy. The other aspect is the actuation commands which are the output of DL networks. The robustness of actuation depends on the quality of simulators and unpredictability in agent dynamics. To facilitate the transfer of policies from simulation to real-world, we will look at the randomization from the point of view of both agent dynamics and sensors input data.

3. Transfer Learning and Domain Adaptation

Transfer learning is a natural extension of sim-to-real transfer as it focuses on transferring the knowledge learned from various but related domains to target domains. Domain adaptation is a special case of transfer learning methods where we have a reliable source and the corresponding task both in source and target but very limited or no target data at all. The domain adaptation techniques prove very useful in transferring the simulation-based model well.

4. Knowledge Distillation

Policy distillation, a method of transferring expertise of one network to another similar but smaller network efficiently, can help in bridging the gap between simulation and real-world. Here the two networks are called Teacher and Student. The teacher network produces loosely labeled data while the student network is trained on the labeled data in a supervised manner. DisCoRL, a scalable and modular pipeline, is an example of how knowledge distillation can be applied in continuous DRL.

5. Meta Reinforcement Learning

Meta-Learning, learning to learn, is all about adapting or generalizing to new tasks and new environments that have never been encountered during training time. The adaptation process happens with the help of test data through a series of training tasks.

This methodology when applied to Reinforcement Learning is referred to as Meta Reinforcement Learning (MetaRL). MetaRL draws the knowledge, using Long Short-Term Memory (LSTM), from past training episodes and applies it to sim-to-real problems.

6. Robust RL and Imitation Learning

Robust Reinforcement Learning (Robust RL) approaches the goal of maximizing reward as an optimization problem. It attempts to train the agent by accounting for input disturbances and modeling errors.

In Imitation learning, instead of having a fixed reward function, the agent learns the mapping between observations and actions (behavior cloning) or estimates the reward function (inverse reinforcement learning) for given demonstrations. Imitation learning and robust RL are useful tools for sim-to-real transfer.

Now let’s look at some of the methodologies for Sim-to-Real Transfer

Zero Shot Transfer

In Zero Shot learning, an agent observes samples from test data classes which were not made available during the training phase. The agent needs to predict the correct classes for those unseen samples. This method is regarded as an extreme example of domain adaptation.

2. System Identification

System identification is to make simulators more realistic. This is done by expressing the physical system through a precise mathematical model. Also, simulators need to be properly calibrated for agent to learn effectively.

3. Domain Randomization

To accurately represent real world, we need to randomize the simulated environment which minimizes the bias between real and simulate world and covers distribution of data in real world.

Visual randomization: It is a way to generalize the real-world data by providing variability of visual parameters during training phase.

Dynamics Randomization: It aims to provide realistic simulatation environment by accounting real world’s coefficient of friction, object dimensions, force gains, and robot joint damping coefficients.

4. Domain Adaptation Methods

As noted above, domain adaptation is a method of transferring knowledge from source domain to target domain with limited availability of target data. To facilitate transfer of knowledge between source and target domains, we need to unify their feature spaces.

Success in domain adaptation in vision related tasks is a precursor for reinforcement learning tasks for agent.

Discrepancy Based methods: Measures distance between source and target features to unify the feature spaces. The distance is calculated in pre-defined statistical metrics.

Adversarial-based methods: Classify domain features as source or target domain features.

5. Learning with Disturbances

Domain randomization and dynamics randomization methods focus on introducing perturbations in the simulation environments to generalize the data distributions of real world and minimize the mismatches from simulations.

6. Simulation Environments

Simulators play an important role in successful sim-to-real transfer. The most widely used simulators are Gazebo, Unity3D, and PyBullet or MuJoCo.

Challenges

Domain Randomizations are challenging to design because it’s hard to explain what and how the randomizations work for the simulations.

Most algorithms based on domain adaptations assume that source and target feature spaces are same and this may not be the case in most situations.

Future research in these domains involve:

(i) the integration of two or more of the current methods for more efficient transfer (e.g., domain randomization and domain adaptation)

(ii) the utilization of incremental complexity learning, continual learning, and reward shaping for complex or multi-step tasks.

Conclusions

Reinforcement learning algorithms often rely on simulated data to meet their need for vast amounts of labeled experiences. However there is a- need to add more realism to the simulation environment to have a successful sim-to-real transfer of knowledge.

Domain randomization and domain adaptation methods are widely recognized as techniques to bridge the gap between simulations and real-world environment. Each of the techniques has their own set of challenges. Policy distillation for multi-task learning and Meta learning for diverse set of tasks are also used for sim-to-real transfer in DRL for robotics.

This field has provided opportunities for updating literature and research to understand how randomization and adaptation methods can influence the simulator’s accuracy.

References

Zhao, Wenshuai & Peña Queralta, Jorge & Westerlund, Tomi. (2020). Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey.

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey

Introduction

Written by Atul Shah