Policy Gradient Methods

Published in

Neurosapiens

1 min readJan 5, 2019

Learn about techniques such as Generalized Advantage Estimation (GAE) for lowering the variance of policy gradient methods. Explore policy optimization methods such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO).

References

The Policy Gradient

So far, our policy has simply been to act greedily on some value function. What if we tried to learn the policy itself…

kvfrans.com

Policy Gradient Methods

References

The Policy Gradient

So far, our policy has simply been to act greedily on some value function. What if we tried to learn the policy itself…

Written by Javier Abellán Abenza