Top stories published by Arxiv Bytes in 2018

Homepage

Open in app

Top Stories published by Arxiv Bytes in 2018

All

2018

Zac Wellmer in Arxiv Bytes

Sep 13, 2018

Summary: Proximal Policy Optimization(PPO)

Ideas from this summary are taken from the Proximal Policy Optimization paper.

PPO offers two key improvements to policy gradient methods:

Surrogate objective include a simple first order trust region…

Summary: TreeQN

Ideas from this summary are taken from the TreeQN and ATreeC paper.

Summary: Value Prediction Networks(VPN)

VPN is a deep reinforcement learning architecture that mixes ideas from both model free and model based methods. Generally model based methods learn environment dynamics so as to predict real observations, however, VPN attempts to learn a dynamics model that…