A Primer on Deep Reinforcement Learning Frameworks Part 1

Shresth Verma
8 min readOct 23, 2018

--

Machine learning these days has sort of become alchemy.

It is quite easy for new comers to connect components, play with parameters and create a beautiful model that solves your task. Had you wished to kick in a deep neural network for your problem at hand, you might have gone for tensorflow or pytorch or keras or paddlepaddle or caffe(really?) or chainer or any other custom homegrown auto-differenciation library. And there you have it, a mighty function approximator. The approach for building suboptimally performing models is pretty straightforward and that’s not a bad thing. It helps attracting young talent and the learning curve isn’t steep at all for beginners.

But now, let’s discuss Reinforcement Learning. Is there a de facto choice for a framework which provides all the tools an RL researcher (or a curious kid) might need for exploratory work? Are there any libraries for creating industrial grade RL based deployable products? With wonderful new papers coming up in RL every week, github is full of repositories by students and researchers creating their own implementations of the state-of-the-art algorithms. But that contains a lot of overlapping of some standard utility tasks. Moreover, nearly every RL task requires a playground, for our agent to play with. This requires the algorithm to neatly build some APIs to interact with the playground(or environment in technical terms) and again, this part becomes repetetive.

Big players like Google, Intel, OpenAI, Facebook, classic frameworks like keras, and wonderful developers(lot of ’em) have all tried to address this problem and thus there’s a lot of such bundled frameworks coming up, trying to compete (or maybe collaborate?) to become the one-stop solution for all your RL woes.

One-stop solution for all your RL woes.

This article is about comparing these frameworks so that you, the reader, don’t have to spend hours or days trying to tweak them for your task and later on realizing it is not working for you. The comparision is primarily focused on 4 parameters:

  1. Ease of use for a new user
  2. Diversity of algorithms implemented
  3. Support for a wide variety of simulation environments
  4. Performance of implemented algorithms and support for parallel and/or distributed training (this is slighty subjective part because I haven’t done benchmarking tests on them all)

Let’s begin now, shall we?

Part 1: OpenAI Baselines, RLlib, Intel’s Coach, TensorForce

Part 2: SLM-lab, keras-rl, chainer-rl, tensorflow agents, Facebook’s ELF

Part 3: Google’s Dopamine, Deepmind’s trfl, Conclusion

  1. OpenAI Baselines

This is one of the oldest attempt at creating a standardised set of deep RL algorithms. It started when OpenAI was playing around with DQN and its variants (Dueling Double Q learning with Prioritized Replay , Double Q learning with Prioritized Replay, Dueling Double Q learning, Double Q learning and so on). In their own words, the purpose is

Reinforcement learning results are tricky to reproduce: performance is very noisy, algorithms have many moving parts which allow for subtle bugs, and many papers don’t report all the required tricks. By releasing known-good implementations (and best practices for creating them), we’d like to ensure that apparent RL advances never are due to comparison with buggy or untuned versions of existing algorithms.

The implementations are clean and reliable. The documentation can help you get started with some straightforward tasks. It even has pretrained weights and given parameters for some sets of tasks thus allowing for easy replicating of existing results. The algorithms implemented are also quite extensive covering

  • Actor Critic with Experience Replay (ACER)
  • ACKTR
  • Deep Deterministic Policy Gradient (DDPG)
  • Deep Q-Learning (DQN)
  • Generative Adversarial Imitation Learning (GAIL)
  • Hindsight Experience Replay (HER)
  • Proximal Policy Optimization (PPO2)
  • Trust Region Policy Optimization (TRPO)

Baselines also includes MPI utilities which you can use for implementing parallel and distributed training regimes for these algorithms.

The only downside here is that it only supports OpenAI gym environment and Atari game tasks. It actually makes sense because this library is just for providing reliable algorithm implementations which you can compare with your own ones. So, if you have some other task in mind, you will have to figure out how to bind it with these algorithms.

2. RLlib

Developed by researchers at UC Berkley’s RISE lab, this library is built on top of Ray, a system for parallel and distributed Python that unifies the ML ecosystem. The core philosophy of RLlib is

We argue for building composable RL components by encapsulating parallelism and resource requirements within individual components, which can be achieved by building on top of a flexible task-based programming model. We demonstrate this principle by building Ray RLlib on top of Ray and show that we can implement a wide range of state-of-the-art algorithms by composing and reusing a handful of standard components.

And this philosophy definitely works. RLlib is one of the very few frameworks out there that supports both multi-agent and multi policy training environments (which is a usually a complicated task). It also provides simple Python APIs for parallelizing training over multiple cores as well as distributed training. (See this paper for details)

MultiAgent Environment Support in RLlib

The algorithms covered are also quite diverse (see more details here)

Algorithms covered by RLlib

The docs also contain info on how to use RLlib on clusters and has example scripts to help you get started along with parameters that work well. RLlib is also integrated with RayTune, a tool for hyperparameter searching. It also supports multiple frameworks for defining the neural network such as tensorflow, keras, pytorch and this is possible because of isolating the policy graphs that encapsulate core RL algorithm.

Again, the downside here is that, it only supports OpenAI gym environment out of the box. But it does have an example script for creating bindings with Carla Simulator and contains abstract class definitions which you can use to create your own bindings with other environments. But overall, this is a powerful framework and has been consistently cited in various works in the academia.

3. Intel’s Coach

Built by Nervana Systems which was acquired by Intel in 2016 and open sourced a year ago. This is the framework that comes close to the RL “one stop for all” dream. It has support for a variety of algorithms along and has bindings with every major simulation environment researchers care about. The installation is easy, it is simple to get started, the components are highly modular and reusable and it has its own dashboard for visualising training (byebye tensorboard). The framework is built on top of Intel Optimised Tensorflow and Intel’s own DL framework Neon.

Some of the environments Supported by Intel’s Coach

Some of the supported environments

  • OpenAI Gym
  • ViZDoom
  • Roboschool
  • GymExtensions
  • PyBullet
  • CARLA
  • Starcraft:
  • DeepMind Control Suite:
Algorithms Implemented by Intel’s Coach

While the framework is overall very complete, it is pretty new and there might be bugs in the implementations. (That’s just my scepticism speaking, I love coach). There’s also some complex environments that still need to be implemented such as Unity and Unreal Game Engines and this comes with the scope of adding multi agent reinforcement learning. Moreover, this framework lacks support for distributed training and more exhaustive examples are needed.

4. TensorForce

Tensorflow is one of the forerunners among RL frameworks that are not developed or backed by the giants. The development efforts are primarily spearheaded by three developers Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke. The framework’s main motivation is to solve issues such tight coupling of RL logic with simulation handles, fixed network architectures and incompatible state/action interfaces.

It is meant to be used as a library in applications that want to utilize deep RL, and enables the user to experiment with different configurations and network architectures without caring about all the underlying bells and whistles. We fully acknowledge that current RL methods tend to be brittle and require a lot of fine-tuning, but that does not mean it is not the time yet to think about general-purpose software infrastructure for RL solutions.

This, again, is aligned towards our dream of a generic RL ecosystem. The environments definitions are all abstracted, and the agents, the configurations, the training runner are all modular components. The docs provide some simple examples on how to get started, along with parameters that are known to work. The set of algorithms covered are pretty decent too

  • A3C using distributed TensorFlow or a multithreaded runner
  • Trust Region Policy Optimization (TRPO)
  • Normalised Advantage functions (NAFs)
  • DQN
  • Double-DQN
  • N-step DQN
  • Vanilla Policy Gradients (VPG/ REINFORCE)
  • Actor-critic models
  • Deep Q-learning from Demonstration (DQFD)
  • Proximal Policy Optimisation (PPO)
  • Random and constant agents for sanity checking

The bundle of environments supported is by far one of the most exhaustive set

With support for more environments coming soon

  • Gazebo robotic simulation — link
  • Carla, Open-source simulator for autonomous driving research — link
  • Unity game engine — link (I’m implementing this :D)
  • Project Malmo minecraft bindnig — link
  • DeepMind Starcraft 2 learning environment — link
  • DeepMind control, dm_control — link
  • OpenAI roboschool — link
  • DeepGTAV — GTA 5 self-driving car research environment — link
  • Siemens industrial control benchmark — link

Some of these might not be very popular simulators in the academia but having bindings for all these environments provides a really helpful insight into the simulators and how to interact with them. These can serve as baselines for your own implementation of another task relating to these simulators. The library is also very active in development and open to new contributors (like mee!!). On the downside, some of the implementations are buggy and need some more work. The performance is also an issue where, for example, the code can’t utilise multiple cores efficiently. But these will be fixed in future release of TensorForce.

Part 2 and Part 3 will cover more frameworks including very recent ones like Dopamine and TRFL and a final conclusion. Thanks for reading and be on a lookout for more stuff!! If you wish to add on more about any of the frameworks or want me to talk about your own dear framework, don’t hesitate to tell me in the comments. :)

--

--

Shresth Verma

Wanderer, Birder. Curious about the beauty of maths and strong AI. Data Scientist at UnitedHealth Group. Also secretly a cat.