RL 0 to 1: How to Learn RL
Pre-requirements
Recommend reviewing my post for covering resources for the following sections:
- Math
- Python
- Know basic of Neural Network
- Frameworks
Math review
- Linear Algebra Review and Reference
- Probability Theory Review
- Convex Optimization Overview, Part I
- Convex Optimization Overview, Part II
- Hidden Markov Models
- The Multivariate Gaussian Distribution
- More on Gaussian Distribution
- Gaussian Processes
RL Vocabulary
- MDP
- Markov chain Monte Carlo
- Bellman equations
- Reward, state, policy, discounting factor, trajectory, state-space, transition function
- Dynamic Programing
- Value Function
- Q-Learning
- Policy Gradient
- Model Based — Model Free — partially observable
- Exploration vs Exploitation
- Inverse RL / Imitation RL / Apprenticeship Learning/ Meta Learning/ Transfer learning
- Reward ugmentation/ Reward shaping
- Actor Critic
- Monte Carlo Tree Search
- Human in the loop
- Deep RL
- Zero-shot — One-shot — Few-shot
- Differentiable
Baselines / environment
OpenAI and DeepMind build environment that allows researchers to run and test their RL models. These environments are considered as benchmarks for RL. In order to become familiar how to run any RL algorithms in these environments, I recommend reading Andrej Karpathy blog on Deep Reinforcement Learning and OpenAI Gym documentation.
Here is the list of all the environments with OpenAI and DeepMind.
OpenAI:
- Dota2 by OpenAI https://blog.openai.com/dota-2/
- OpenAI Gym https://gym.openai.com/
There are many types of environments in OpenAI Gym. The current environments are:
DeepMind:
- Starcraft2
- DeepMind Lab
- DeepMind Control Suite — It is similar to OpenAI MuJoCo
Datasets
DeepMind released many datasets for researchers to run RL model on top of them. Here are the list of datasets that is available by DeepMind:
Choosing the dataset is really depends on what model you like to run, There are few useful datasets by Kaggle, which I recommend considering to run RL models.
Main blogs
Here is the list of some famous blogs. Many of them bragging why RL does not work.
- Deep Reinforcement Learning Doesn’t Work Yet
- An Outsider’s Tour of Reinforcement Learning
- Greg Brockman on Resources — I recommend reading this post.
- Deep Deterministic Policy Gradients in TensorFlow
- Collection of Deep Learning resources
- Learning to Learn
Main-papers
I made the list based on recommendation of couple of friends, farmer colleagues and my own intuition.
- DQN: Nature paper
- A2C / A3C
- PPO
- TRPO
- HER
- Rainbow
- DDPG
- Feudal Networks
- Learning to learn by gradient descent by gradient descent
- AlphaGo Nature Paper
Fast RL Exploration/Exploitation
- Variational Information Maximizing Exploration
- The Many Faces of Optimism
- Deep Exploration via Bootstrapped DQN
Q-Learning
- DQN: Nature paper
Deep Q-Learning Paper
- Deep Reinforcement Learning with Double Q-learning
- Prioritized Replay
- Hindsight Experience Replay
- Rainbow
Policy Gradient
Monte Carlo Tree Search
- Monte-Carlo tree search and rapid action value estimation in computer Go
- AlphaGo: Mastering the game of Go without human knowledge
Human in the loop
Imitation Learning
- Maximum Entropy Inverse Reinforcement Learning
- Apprenticeship Learning via Inverse Reinforcement Learning
Hierarchy
Model Based
Meta-RL
Courses
I personally start learning RL by watching Pieter Abbeel`s bootcamp lectures. Here the classes that I would recommend:
- Stanford CS 234
- Deep RL bootcamp by Pieter Abbeel
- Stanford CS 229 — section in RL
- Berkeley Deep RL CS 294
- Nando de Freitas’ course on machine learning
I never watched online courses for RL, but it seems to be a good place to start. There are many of these courses exist — I listed a few. I am not sure, which one would be the best one:
Textbooks
This is the most well-written text book.
- Reinforcement Learning: An Introduction, by Sutton and Barto
- Algorithms for Reinforcement Learning
- Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Approximate Dynamic Programming
Main Researchers
- Pieter Abbeel — Professor @ Berkeley
- David Silver — main research lead @ Alpha Go
- Richard Sutton — known for RL text book
- John Schulman — works at OpenAI
- Volodymyr Mnih — initial DQN paper — worked under Geoffrey hinton
Main Research labs
- OpenAI
- DeepMind
- GoogleBrain
- Facebook — Newyork team
Extra-Math
If you would like to know some additional subjects, I recommend reviewing the following math — it is not required but it would great to have strong background in math.