Running the PPO baseline and giving up on local evaluation

Days 16–18 of the OpenAI Retro Contest

Now that I am back in the land of decent internet, I could build my docker image for local evaluation. It looked like it was going to take some time, but the results definitely seemed worth it. Just take a look at the agent’s first run vs the one 700+ iteration into the future:

The only issue was that it was taking hours to run on my MacBook’s cpu. I would never be able to do much experimentation if I had to wait for hours or running overnight for each iteration and I was kind of worried my computer would catch fire from all the hard work. So I set out to optimize.

My previous attempt to get TensorFlow compiled optimally for my computer didn’t go well, but that was also because of unrelated confusion from not having a GPU. After searching the web a bit, I tried experimented with this dockerfile from Google:

At first I tried just adding the parts that I needed, since it didn’t seem like I needed jupyter, but I quickly reverted to seeing if I could get it working as-is. I ran into this bug, but was able to continue by adding the suggested flags.

I wasn’t clear about what was already included with the retro contest base image, so my image ended up looking like this:

And that seemed to work, despite being very non-optimal and redundant, or at least I didn’t run into any errors that exited and a a few warnings during compilation is normal right?

Well a few hours after that I realized that I was getting the same warnings over and over again. Somehow it seems like I was able to get stuck in an infinite loop during compilation. I have my output log here if anyone wants to tell me what is wrong. I wasn’t able to find much help online about optimizing my dockerized ubuntu compiled TensorFlow to take advantage of a couple extra instructions my CPU had. Either this is something that is not often done, or I had no idea what the right search terms were.

Ultimately I decided the time spent on this fest like trying to crawl most efficiently when I should be trying to learn to walk. Even with if I get my laptop to work 3x better, that is still hours of hot high cpu time that I didn’t want. I decided that my future investment should be in getting a remote host that has gpu up and running with my agents.

Submission time!

Team Bobcats hadn’t submitted an agent in a while. So I built the GPU version of the PPO2 baseline and sent it off for evaluation. It scored a paltry 3280 which put us in 56th place. Looking at the video showed some room for improvement as Sonic mostly ran into a wall.

Bonus! How to create split screen videos with ffmpeg.

If you liked my video at the top, here is how you can make your own. First I used the playback tooling from Day 6 to convert the .bk2 files generated from my agent into .mp4 videos. Then I used ffmpeg to and the nice people on who answer questions in various StackExchanges to combine the two videos into one that was side-by-side:

ffmpeg \
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0001.mp4 \
-i results/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0729.mp4 \
-filter_complex '[0:v]pad=iw*2:ih[int];[int][1:v]overlay=W/2:0[vid]' \
-map [vid] \
-c:v libx264 \
-crf 23 \
-preset veryfast \