Learn Reinforcement Learning with TensorFlow and TRFL

AurelianTactics
aureliantactics
Published in
2 min readApr 26, 2019

Disclaimer: While this is not a paid or sponsored post, I was compensated for my work on ‘Hands on Reinforcement Learning with TensorFlow & TRFL.’

Recently I worked with Packt Publishing to help create ‘Hands on Reinforcement Learning with TensorFlow & TRFL.’ This series of technical training videos with code examples explores major topics in Reinforcement Learning (RL) by emphasizing DeepMind’s TRFL usage with TensorFlow. If you have searched for guidance, tutorials, or examples on how to use TRFL you’ll have noticed there isn’t much out there. This series helps remedy this by explaining usage and providing code examples of almost all TRFL functions (29 of 31 ‘Learning updates’ and 4 of the ‘Other’ functions).

The general layout for the videos is an introduction to a major RL topic, explanation of that topic, how to use the related TRFL function, key arguments to be aware of, code example(s), and links to further resources. For example, in the Double Q learning and Persistent Q learning video, we frame the larger issue of approximation error and overestimation bias, explain how the two methods help, show how TRFL implements trfl.double_qlearning() and trfl.persistent_qlearning(), go through the code of implementing these functions, and link to relevant resources for those wanting to know more.

While some RL knowledge is assumed, each video tries to place the topic in context and offers a summary and refresher for topics you may not be familiar with or haven’t dealt with much. For those wanting to know more or wanting further clarification, the further resources can help solidify your understanding.

Each video has an accompanying code notebook highlighting TRFL usage. There’s a lot of useful code here for learning and using TRFL. We cover the fundamentals like one step TD methods (TD learning, Q learning, SARSA, & SARSE) and TD(λ) methods to Deep Learning methods like policy gradients (vanilla, discrete, continuous, policy entropy loss, A2C) and DQN and DDPG techniques (double Q learning, distributional Q learning, deterministic policy gradients) to more advanced methods like DeepMind’s Retrace, V-trace, and pixel control.

for example, in section 3 we implement REINFORCE, REINFORCE with Baselines, A2C, and Deep Deterministic Policy Gradients in code. In four videos. For the discrete and continuous cases. And we throw in policy entropy loss. It’s a lot of useful TRFL code presented in a concise and straightforward way. All notebooks can be run locally or in Google Colab.

If you’re interested Packt has a free 10 day trial where you can try out this course and many others ($9.99 per month after the free trial).

--

--