Quick tour of major RL algorithms on PlaneStrike
Wayne Wei
41

Thanks for the implementations. A good work. But I don’t understand why VPG get better performance than PPO or TRPO. I think VPG is a policy gradient algorithm while the other two are Actor Critic algorithm. It should be PPO or TRPO that will have better convergence properties.

Like what you read? Give Charles Young a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.