Scalable Efficient Deep-RL

A more efficient way to scale up reinforcement learning algorithms

Sherwin Chen
Nov 7 · 4 min read

Introduction

Traditional scalable reinforcement learning framework, such as IMPALA and R2D2, runs multiple agents in parallel to collect transitions, each with its own copy of model from the parameter server(or learner). This architecture imposes high bandwidth requirements since they demand transfers of model parameters, environment information and etc. In this article, we discuss a modern scalable RL agent called SEED(Scalable Efficient Deep-RL), proposed by Espeholt&Marinier&Stanczyk et al in Google Brain team. that utilizes modern accelerators to speed up both data collection and learning process and lower the running cost(80% reduction against IMPALA measured on Google Cloud).

Comparison between SEED and IMPALA. Source: SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference

Deficiency of Traditional Distributed RL

Here we compare SEED with IMPALA. The IMPALA architecture, which is also used in various forms in Ape-X, OpenAI Rapid and etc., mainly consists of two parts: A large number of actors periodically copy model parameters from the learner, and interact with environments to collect trajectories, while the learner(s) asynchronously receives transitions from the actors and optimizes its model.

  1. Inefficient resource utilization: Actors alternate between two tasks: environment steps and inference steps. The computation requirements for the two tasks are often not similar which leads to poor utilization or slow actors. E.g., some environments are inherently single threading while neural networks are easily parallelizable
  2. Bandwidth requirement: Model parameters, recurrent states, and transitions are transferred between actors and learners which would introduce a huge burden to the network bandwidth.

Architecture of SEED

SEED is designed to solve the problems mentioned above. As shown in Figure 1b, inference and transitions collection are moved to the learner which makes it conceptually a single-machine setup with remote environments. For every single environment step, the observations are sent to the learner, which runs the inference and sends actions back to the actors

Source: SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference
  • Data prefetching: When a trajectory is fully unrolled it is added to a FIFO queue or replay buffer and later sampled by data prefetching threads
  • Training: The training thread takes the prefetched trajectories stored in device buffer, apply gradients using the training TPU(or GPU) host machines.
  1. It resorts to a simple framework that uses gRPC — — a high-performance RPC library. Specifically, they employ streaming RPCs where the connection from actor to learner is kept open and metadata sent only once. Furthermore, the framework includes a batching module that efficiently batches multiple actor inference calls together. In cases where actors can fit on the same machine as learners, gRPC uses Unix domain sockets and thus reduces latency, CPU, and syscall overhead. Overall, the end-to-end latency, including network and inference, is faster for a number of the models we consider below
End-to-end inference latency of IMPALA and SEED for different environments and models

Cost Comparison

The following figure compares the cost of training SEED and IMPALA in different environments on Google Cloud. We can see that SEED save more cost as the network grows larger and larger.

References

Espeholt, Lasse, Raphaël Marinier, Piotr Stanczyk, Ke Wang, and Marcin Michalski. 2019. “SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference,” 1–19. http://arxiv.org/abs/1910.06591.

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Sherwin Chen

Written by

A learner, interested in deep learning and reinforcement learning.

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade