Running the PPO baseline and giving up on local evaluation

Days 16–18 of the OpenAI Retro Contest

3 min readApr 24, 2018

Now that I am back in the land of decent internet, I could build my docker image for local evaluation. It looked like it was going to take some time, but the results definitely seemed worth it. Just take a look at the agent’s first run vs the one 700+ iteration into the future:

The only issue was that it was taking hours to run on my MacBook’s cpu. I would never be able to do much experimentation if I had to wait for hours or running overnight for each iteration and I was kind of worried my computer would catch fire from all the hard work. So I set out to optimize.

My previous attempt to get TensorFlow compiled optimally for my computer didn’t go well, but that was also because of unrelated confusion from not having a GPU. After searching the web a bit, I tried experimented with this dockerfile from Google: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel

At first I tried just adding the parts that I needed, since it didn’t seem like I needed jupyter, but I quickly reverted to seeing if I could get it working as-is. I ran into this bug, but was able to continue by adding the suggested flags.

I wasn’t clear about what was already included with the retro contest base image, so my image ended up looking like this:

And that seemed to work, despite being very non-optimal and redundant, or at least I didn’t run into any errors that exited and a a few warnings during compilation is normal right?

Well a few hours after that I realized that I was getting the same warnings over and over again. Somehow it seems like I was able to get stuck in an infinite loop during compilation. I have my output log here if anyone wants to tell me what is wrong. I wasn’t able to find much help online about optimizing my dockerized ubuntu compiled TensorFlow to take advantage of a couple extra instructions my CPU had. Either this is something that is not often done, or I had no idea what the right search terms were.

Ultimately I decided the time spent on this fest like trying to crawl most efficiently when I should be trying to learn to walk. Even with if I get my laptop to work 3x better, that is still hours of hot high cpu time that I didn’t want. I decided that my future investment should be in getting a remote host that has gpu up and running with my agents.

Submission time!

Team Bobcats hadn’t submitted an agent in a while. So I built the GPU version of the PPO2 baseline and sent it off for evaluation. It scored a paltry 3280 which put us in 56th place. Looking at the video showed some room for improvement as Sonic mostly ran into a wall.

Bonus! How to create split screen videos with `ffmpeg`.

If you liked my video at the top, here is how you can make your own. First I used the playback tooling from Day 6 to convert the .bk2 files generated from my agent into .mp4 videos. Then I used ffmpeg to and the nice people on who answer questions in various StackExchanges to combine the two videos into one that was side-by-side:

ffmpeg \
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0001.mp4 \
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0729.mp4 \
-filter_complex '[0:v]pad=iw*2:ih[int];[int][1:v]overlay=W/2:0[vid]' \
-map [vid] \
-c:v libx264 \
-crf 23 \
-preset veryfast \
output.mp4

Thanks for reading! You might be interested in the rest of this series:

Day 1: Getting the Basics Set Up
Day 3: Running the Jerk Agent
Days 4 & 5: Getting TensorFlow & Docker to work on my MacBook
Day 6: Playback Tooling for .bk2 files
Days 9 &10: Failing with the Rainbow DQN baseline code.
Days 11–14: Reading the PPO2 code
Days 16–18: Running the PPO2 baseline code, and failing at TensorFlow & Docker optimization.
Days 22–25: A Deep Dive into the Jerk Agent
Days 26–29: Visualizing batches of sonic runs
Days 38–53: Discovering Q-Learning
My final submission: the improved JERK agent

Running the PPO baseline and giving up on local evaluation

Days 16–18 of the OpenAI Retro Contest

Submission time!

Bonus! How to create split screen videos with ffmpeg.

Written by Tristan Sokol

Bonus! How to create split screen videos with `ffmpeg`.