A way to speed up Python RL environments

Andreas Kirsch
BlackHC
Published in
2 min readFeb 19, 2018

OpenAI’s gym environment only supports running one RL environment at a time. If you want to run multiple environments, you either need to use multiple threads or multiple processes. Moreover, this only parallelizes well as long as you have sufficiently many cores available. However, if your batch size increases beyond that, it won’t become any faster. The overhead also uses up compute that could be used for training and inference.

Most current RL implementations actually work by sampling training data from multiple environments and training on batches. This is usually done by having multiple workers that run multiple environments at the same time to generate lots of training data. DeepMind’s IMPALA agent does this, for example.

Instead of running one environment at a time, we can run multiple environments in batch on a single machine, though. Like in CUDA, which uses a SIMT architecture (single instruction, multiple threads), we can implement environments so that every step (and within it every calculation) is performed on many environments at the same time.

We can reap considerable speedups with some increases in code complexity: to test this approach, I have implemented a pong environment that uses numpy records to store the state of the game and batch computations over it. The code can be found at https://github.com/BlackHC/batch_pong_poc.

A batched implementation allows for considerable speedups (60x) over a vanilla Python implementation, and even higher speedups compared to running an Atari environment (500x). Obviously, the Atari pong environment cannot be really compared to it because it does more and different things, but this pong environment with a similar game logic can be sufficient for debugging and iterating.

time per step per environment vs #environments

In the plot, you can see that running multiple Atari or vanilla Python environments sequentially obviously does not benefit from any speedups whereas a batched environment using numpy enjoys a considerable speedup as the number of environments increases. Numpy has very efficient implementations of vectorized operations, and thus can advance the batched environments much faster.

For simple environments, using a batched variant can offer significant speedups. The saved time can be used to perform hyperparameter optimizations or just sample more training data.

--

--

Andreas Kirsch
BlackHC

DPhil student at AIMS in Oxford; former RE at DeepMind, former SWE at Google; fellow at Newspeak House.