Running the PPO baseline and giving up on local evaluation
Days 16–18 of the OpenAI Retro Contest
Now that I am back in the land of decent internet, I could build my docker image for local evaluation. It looked like it was going to take some time, but the results definitely seemed worth it. Just take a look at the agent’s first run vs the one 700+ iteration into the future:
The only issue was that it was taking hours to run on my MacBook’s cpu. I would never be able to do much experimentation if I had to wait for hours or running overnight for each iteration and I was kind of worried my computer would catch fire from all the hard work. So I set out to optimize.
My previous attempt to get TensorFlow compiled optimally for my computer didn’t go well, but that was also because of unrelated confusion from not having a GPU. After searching the web a bit, I tried experimented with this dockerfile from Google: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel
At first I tried just adding the parts that I needed, since it didn’t seem like I needed
jupyter, but I quickly reverted to seeing if I could get it working as-is. I ran into this bug, but was able to continue by adding the suggested flags.
I wasn’t clear about what was already included with the retro contest base image, so my image ended up looking like this:
And that seemed to work, despite being very non-optimal and redundant, or at least I didn’t run into any errors that exited and a a few warnings during compilation is normal right?
Well a few hours after that I realized that I was getting the same warnings over and over again. Somehow it seems like I was able to get stuck in an infinite loop during compilation. I have my output log here if anyone wants to tell me what is wrong. I wasn’t able to find much help online about optimizing my dockerized ubuntu compiled TensorFlow to take advantage of a couple extra instructions my CPU had. Either this is something that is not often done, or I had no idea what the right search terms were.
Ultimately I decided the time spent on this fest like trying to crawl most efficiently when I should be trying to learn to walk. Even with if I get my laptop to work 3x better, that is still hours of hot high cpu time that I didn’t want. I decided that my future investment should be in getting a remote host that has gpu up and running with my agents.
Team Bobcats hadn’t submitted an agent in a while. So I built the GPU version of the PPO2 baseline and sent it off for evaluation. It scored a paltry
3280 which put us in 56th place. Looking at the video showed some room for improvement as Sonic mostly ran into a wall.
Bonus! How to create split screen videos with
If you liked my video at the top, here is how you can make your own. First I used the playback tooling from Day 6 to convert the
.bk2 files generated from my agent into
.mp4 videos. Then I used
ffmpeg to and the nice people on who answer questions in various StackExchanges to combine the two videos into one that was side-by-side:
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0001.mp4 \
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0729.mp4 \
-filter_complex '[0:v]pad=iw*2:ih[int];[int][1:v]overlay=W/2:0[vid]' \
-map [vid] \
-c:v libx264 \
-crf 23 \
-preset veryfast \
Thanks for reading! You might be interested in the rest of this series:
- Day 1: Getting the Basics Set Up
- Day 3: Running the Jerk Agent
- Days 4 & 5: Getting TensorFlow & Docker to work on my MacBook
- Day 6: Playback Tooling for
- Days 9 &10: Failing with the Rainbow DQN baseline code.
- Days 11–14: Reading the PPO2 code
- Days 16–18: Running the PPO2 baseline code, and failing at TensorFlow & Docker optimization.
- Days 22–25: A Deep Dive into the Jerk Agent
- Days 26–29: Visualizing batches of sonic runs
- Days 38–53: Discovering Q-Learning
- My final submission: the improved JERK agent