Day 6 of the OpenAI Retro Contest: playback tooling
How to tell if your computer is really learning, or just slacking off.
Now that wehave gotten some agents working in the OpenAI Retro Contest (the jerk on day 3 & a Rainbow DQN on days 4 & 5) I wanted to take a moment to build out some of the tooling we were using.
The agents are now too complex to watch them realtime and have them get anywhere in their learning, I am going to have to check into how my agent is playing sonic a different way. Luckily Gym Retro has a handy record feature that spits out a .bk2
file every time the game is played.
Turning .bk2
into something useful
The .bk2
format seems to be a log of instructions that the environment can use to deterministically recreate the play-through with. It is great in that it is super small ( each one is only ~45kb) and it can be used to replay a session for more learning. The downside is that you can’t just double click it and watch your agent’s sad attempt to beat the first level of Sonic. (Though I would love for someone to make a QuickLook extension! )
The Gym Retro provides a couple of tools for this. One is the playback_movie.py
that will convert your .bk2
into a .mp4
like this one with ffmpeg:
That is kind of neat, but the transformation to video is kind of time consuming and also increases the size on disc 100x+. It did not really seem like a good option for watching very many replays. I did however take fellow contestant Lyons
suggestion and created a local script to make that conversion easier to invoke:
Now I can run
python3 ./scripts/convertbk2-mp4.py ./results/bk2/SonicTheHedgehog-Genesis-GreenHillZone.Act1-0001.bk2
to convert any of the runs into a .mp4
A better solution
What I ended up using quite a bit more was the playback interface provided in the code for Gym Retro. You can load the file, and step through it watching the environment render. Playing one file at a time though was slow to see the difference between runs, so I added some basic functionality to ingest a whole folder of .bk2
files and play them one after another:
With python3 ./scripts/render.py ./results/bk2/
I can watch all of my replays at a faster speed (depending on the framerate
variable), and one after another like this:
I did run into one issue where my installed version of Gym Retro didn’t have this commit, so my script would open an additional window for each playback rendered. Luckily one of the admins in the Discord pointed me to the fix and I tried upgrading with pip, but it seems like the binaries don’t have the fix, so in the end I just applied the patch to my local code instead.
There was a tiny bug that I spotted in the Gym Retro readme that I fixed as well with this pull request: https://github.com/openai/retro/pull/23
What’s the plan for tomorrow?
In my mind there are a couple workstreams to move forward on for Ben and I on team Bobcats.
- I would like to spend a couple days digging into the rainbow agent’s code to start understanding it, hopefully starting on the path of being able to start creating some new agents.
- When running TensorFlow, a waring pops up that:
Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
- so it seem worthwhile to figure out how to compile tensorflow in my Docker build process for the speed improvements, and hopefully learn something about Docker in the process.
Thanks for reading! I hope this helps others who are competing and as always, if you have any questions, comments, concerns, excitements feel free to drop me a line.
Thanks for reading! You might be interested in the rest of this series:
- Day 1: Getting the Basics Set Up
- Day 3: Running the Jerk Agent
- Days 4 & 5: Getting TensorFlow & Docker to work on my MacBook
- Day 6: Playback Tooling for
.bk2
files - Days 9 &10: Failing with the Rainbow DQN baseline code.
- Days 11–14: Reading the PPO2 code
- Days 16–18: Running the PPO2 baseline code, and failing at TensorFlow & Docker optimization.
- Days 22–25: A Deep Dive into the Jerk Agent
- Days 26–29: Visualizing batches of sonic runs
- Days 38–53: Discovering Q-Learning
- My final submission: the improved JERK agent