Understanding PPO Plots in TensorBoard

OpenAI Baselines and Unity Machine Learning have TensorBoard integration for their Proximal…


Using Joint PPO with Ray

Joint PPO is a modification of Proximal Policy Optimization (PPO). Joint PPO was used by the winner of OpenAI’s Retro Contest. Joint PPO in a few lines:

During meta-training, we train a single policy to play every level in the training set. Specifically, we…

Using Ray for Reinforcement Learning

I’ve been exploring ray for Reinforcement Learning (RL) the past couple of weeks. ray provides…