Quick tour of major RL algorithms on PlaneStrike

Wayne Wei
Wayne Wei
Jul 28, 2017 · 2 min read

I was very excited to see the release of TensorForce, which seems to offer a very slick interface to reinforcement learning. I previously tried rllab but couldn’t make it quite work. TensorForce looks simpler and the build-in algorithms are more comprehensive. And according to their blog, I think the overall design is more thoughtful. So I decided to run my good old PlaneStrike game through it and see how TensorForce does. It turns to be quite easy. All I needed to do was create a simple environment and hit ‘run’. Code is here. Graph below shows smoothed reward per episode vs. iteration when I penalize repeat move:

The follow graph is for when I do not penalize repeat move:

A few comments:

  1. It’s very cool to see things working :)
  2. Vanilla policy gradient (actor-critic) did the best. Pure policy gradient methods TRPO and PPO seem less efficient here.
  3. I did not get to try A3C since I did not have time to figure out how to set up cluster_spec and etc.
  4. Penalizing repeat move certainly helped all algorithms

Next I’ll try to apply TensorForce to a somewhat practical problem in ads yield management.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade