PRL — a novel approach to building a reinforcement learning framework in Python

Written by Piotr Tempczyk

Piotr Tempczyk

Published in

Acta Schola Automata Polonica

7 min readOct 22, 2019

The idea behind library

People’s Reinforcement Learning is a framework for researchers that allows you to build your own agents and conduct RL experiments by combining simple building blocks and to implement only data transformations and agent logic. This blog post is an introduction to the library, so if you are already familiar with the basics you can jump right into more detailed tutorial here.

At the time we were starting our research on reinforcement learning at OPIUM (beginning of 2018) there were many open-source libraries in existence with many agent implementations (e.g. Keras-RL, Tensorforce or TFRL). But there were some issues with those libraries. First of all, in many of them each agent was a big, standalone function, making it hard to reuse the code between agents and adapt it to individual purposes. Second of all, in some of them the level of abstraction was too low, so it was easy to get lost in the code. A common issue was also the lack of good tutorials and documentation.

Let us compare the situation to neural network frameworks. Imagine, that you could have many neural network architectures written in a separate manner not meant to be easily changed. You would have ResNet and VGG-16 created in some deep learning library but when you want to change an activation function in a residual block you have to do it another way, than if you change an activation function in the last layer of VGG. It is because there is no common underlying structure which is meant to be easily modified and your framework contain only functions or classes for each of the architectures separately. We wanted to build a library suitable for RL research at the level of abstraction comparable to PyTorch. This is why we started building PRL framework. At the beginning we were using it only for our internal projects, but after some time we decided to open source it because we belived, that the concepts of abstraction and hermetization used in PRL is something unique among other Python RL frameworks. We wanted to share it with other researchers facing the same problems as we did then.

What is also unique in the PRL, is that it hides many unimportant things (from the researcher’s point of view) behind the scenes. While using PRL you can focus on a structure of an agent, state transformations, neural networks architecture, action transformations, and reward shaping. Time and memory profiling, logging, agent-environment interactions, agent state saving, neural network training, early stopping or training visualization happens automatically behind the scenes. You are also provided with very useful tools for handling training history and preparing training sets for neural networks.

Illustration by Daniel Mróz from Stanisław Lem’s “The Cyberiad”. Image not directly connected to the topic but the book is certainly worth reading.

Overall structure

One of our main goals while building the PRL framework was to make agent implementations as clear and as compact as we can. Agent is represented by an Agent class. All the state, reward and action transformations are encapsuated within Environment class, which is a wrapper for gym-like environments (environments with the the same API as OpenAI gym environments). We decided to make it that way, because one representation of the environment can be suitable for many different agents.

For example, let’s take a problem from the NIPS 2017 competition ”Learning to run”. Participants had to build an agent which can make skeleton run while being able to control only tension of a muscles attached to the bones. One thing you can do to change the problem to be potentially easier for RL agents is to discretize the action space (muscle tension) or make it to be a difference between last muscle tension and the current one. It can help many agents to be trained more efficiently, so it is better to associate this kind of transformation to the environment than to the agent. It also makes experiments like “Let’s try now this DDPG agent on our problem” less time consuming once you set up environment for the first agent.

On the other hand, you want to bind the function approximators (i.e. neural networks) to the agent, because every agent has its own networks and losses, which it uses to train those networks. But there is one thing to remember: you have to make sure, that our function approximators input and output match what you get from the environment (rewards and states) and what you have to return to it (actions).

Agent and Environment classes can use Storage objects to easily manage training history in order to transform states and create training sets for neural networks. Neural networks are wrapped within the FunctionApproximator class, which provides the user with unified API. All the networks are implemented in PyTorch and all the data transformations outside the networks are written using NumPy and Numba.

We use PyTorch in PRL because it integrates with NumPy very easily and is at a very good level of abstraction suitable for researchers. But with a little time, PRL can be adapted to work with any kind of framework for building machine learning models (for example Keras, Tensorflow or even XGBoost).

Neural networks packed within the FunctionApproximator class can be easily transferred between different agents. For example, you can pretrain a policy using one algorithm (using imitation learning for example) and then use it in another agent as a hot-start in another training algorithm (e.g. actor-critic).

Additional tools

Because PRL is all about experimenting and writing your own agents and transformations you can easily profile execution time of any function or method you have written. It can be done using @timeit decorator. At the end of a training you can print the time profiler report and compare your code to the execution time of other methods. You can easily find bottlenecks in your code this way.

There are also some other development tools included in the library. You have other loggers at your disposal which can automatically log memory consumption of some bigger data structures, agent training statistics, neural networks training statistics and logger for your purpose. In addition to these loggers you can track all those statistics live in terminal or in Tensorboard using TensorboardX.

Callbacks

PRL also provides some useful agent callbacks for saving checkpoints, early stopping or mentioned before monitoring agent training activity in Tensorboard.

Very simple example

Let’s take a look how to execute a simple reinforcement learning experiment in PRL.

Simple example of PRL: random agent interacting with cart pole environment

After executing this file you will get output similar to this:

At the top, there is a standard warning from the gym environment and below that is output from time logger. If you want to add more output (i.e. total reward after each episode) you can do this using callbacks. I am going to show you how to do this in the next blog post.

For more examples, you can check the examples/ directory in the repository.

What’s next?

Writing your own agents and modifying existing ones is a task when PRL really shines and that’s why in the next blog post we look into PRL internal modules and see how to write your own agents using PRL.

If you want to learn even more about PRL you can check our documentation, because almost every class and function in PRL is annotated with a comprehensive description of itself and its methods in a form of a docstring. You can also check our project wiki, which can be found here.

Final remarks

If you encounter any problems with library, documentation, this tutorial or you want to contribute to the project please write an email to us at piotr.tempczyk [at] opium.sh. Feel free to use PRL or develop your framework using parts of our framework or join us and contribute to the library by yourself. If you use our code or ideas in your tools please cite our repository as:

Tempczyk, P., Sliwowski, M., Kozakowski, P., Smuda, P., Topolski, B., Nabrdalik, F., & Malisz, T. (2020). opium-sh/prl: First release of Peoples’s Reinforcement Learning (PRL). Zenodo. https://doi.org/10.5281/ZENODO.3662113

Contributors

There were many people involved in this project. This is the list of the most important of them:

Project Lead: Piotr Tempczyk

Developers: Piotr Tempczyk, Maciej Śliwowski, Piotr Kozakowski, Filip Nabrdalik, Piotr Smuda, Bartosz Topolski, Tomasz Malisz

References

Kidziński, Łukasz, et al. “Learning to Run challenge solutions: Adapting reinforcement learning methods for neuromusculoskeletal environments.” The NIPS’17 Competition: Building Intelligent Systems. Springer, Cham, 2018. 121–153.
Jaśkowski, Wojciech, et al. “Reinforcement Learning to Run… Fast.” The NIPS’17 Competition: Building Intelligent Systems. Springer, Cham, 2018. 155–167.

If you enjoyed this post, please hit the clap button below and follow our publication for more interesting articles about ML & AI.