Making robots learn faster

Amey Pore
On Computer Vision and Autonomous Systems
4 min readApr 28, 2020

How to pick and place objects

For me, the quest for robotics started in the molecular Biology lab during my undergraduate years. I wondered what if the simple process of pipetting chemicals could be automated. A large community of experimental biologists could benefit and reduce the tedious and laborious job. This curiosity led me to dive deep in understanding the fundamentals of automation, and I applied for my master’s project at the Computer Vision and Autonomous Systems group at the University of Glasgow.

In Glasgow, I learned more about robotics. I was surprised to know that robots have transformed the manufacturing industry, and they have been used for scientific exploration in inaccessible human environments such as distant planets, oceans, etc. However, I found that a significant barrier in the universal adoption of robotics is their lack of fragility and inability to adapt in a complex and highly diverse environment. For example, if we consider a household robot, it needs to know a vast repertoire of behaviours such as pick objects, clean utensils, floor, etc. Current robotic systems can outperform humans in specific tasks, but when it comes to the generality of its behaviours, humans tend to be way better. For example, the following video is about the DARPA robotic challenge back in 2015, which aimed at developing semi-autonomous ground robots to do dangerous tasks such as rescue operations. As you will notice, most robots failed in extremely trivial tasks, for example, opening a door, walking on rough terrain, etc.

DARPA Robotic Challenge, 2015

Recent advances in deep neural networks combined with the long-drawn field of reinforcement learning have shown remarkable success in enabling a robot to find optimal behaviours through trial-error interactions with the environment. Deep Reinforcement Learning (DRL) provides tools to model hard-to-engineer ad-hoc behaviours; however, it is infeasible to train these algorithms in a physical system. DRL algorithms require millions of trial-and-errors to learn goal-directed behaviours and failures can lead to hardware breakdown. Hence, a standard method employed to train DRL algorithms is to use virtual simulators. In the following video, a human-like robotic hand is trained in a simulator and the knowledge is transferred to reality.

OpenAI: Train agents in simulation and have them solve real-world tasks with unprecedented precision

Existing DRL approaches employ an end-to-end learning strategy to learn and optimise tasks. On the contrary, humans tend to learn simple behaviours first to compose complex behaviours. For example, while learning tennis, we start by learning basic behaviours such as bouncing the ball, hitting, etc., whereas an end-to-end approach attempts to optimise all possible behaviours. To put this into context, a sophisticated DRL method requires millions of trials to complete simple tasks on simulations and games, whereas humans learn them in 50–100 attempts, i.e. really fast!

Schematic of the modular behaviour-based reinforcement learning architecture. The goal of picking an object is subdivided into simpler behaviours that are trained specifically for movements in x, y, z. These behaviours are activated and inhibited by a reactive network, also called ‘Actor-critic’.

Our recent ICRA 2020 paper builds on a simplistic hypothesis:

By tapping into human knowledge, a complex task can be divided into simple behaviours manually, similar to the playing tennis analogy above, and then use a reinforcement learning approach to learn a high-level task such as pick and place a block.

In other words, we set a robot to learn basic behaviours separately using demonstrations and, then, learn to coordinate the basic behaviours’ execution via reinforcement learning to choreograph a pick and place task.

Fetchpickandplace simulator

Our motivation is inspired by Rodney Brook’s subsumption architecture proposed in 1991 that mimics the evolutionary path of intelligence. In Brook’s architecture, a complex behaviour subsumes a set of simpler behaviours, and the task is accomplished by hard-coding the activations of behaviours given a robotic task. In our work, basic behaviours are modelled as simple feed-forward neural networks such as approach, grasp and retract. These behaviours are then ordered by a choreographer that is trained using a DRL algorithm, see diagram above. To learn more about our approach and the experiments, watch the following video!

In our paper, we reported a drastic reduction in training time to learn the pick and place task. The current state-of-the-art DRL algorithms require 95,000 episodes to learn a pick and place task, whereas our approach requires 8,000 episodes. We also go beyond the basic environment structure used in DRL research and include an additional degree of freedom of gripper rotation and spawn the block at a random position. We believe the repertoire of learned simple behaviours could be choreographed/rearranged differently to accomplish different tasks, demonstrating task-related generality. Generality, however, is future work, so stay tuned!

University of Glasgow, April 2019

About the author: Ameya Pore is currently enrolled for a joint doctoral program at the Altair robotics lab, University of Verona and the Universitat Politècnica de Catalunya, Barcelona as Early-stage Researcher with the Marie Sklodowska-Curie fellowship under the European Union Innovative Training network. His current work is based on finding an optimal learning method for navigation and control in autonomous surgical robots.

--

--