Making fun visuals, history maps and other tool improvements
Days 26–29 of the OpenAI Retro Contest
With my new deep understanding of the jerk agent, I had a ton of ideas of new agents to try out, but making small code changes and testing them was getting tiring with all the little steps and room for errors, so I decided to spend some time focusing on my tooling.
Where Does Sonic spend his time?
I was playing around agents that mapped out the level using Sonic’s position and the pixel data for the observer, when my hopes were destroyed when I remembered that your agents do not get the
info variable in the contest environment. I decided that my work could still be put to good use on the masses of
.bk2 files that a run generates. I started with an image of the level map, and then lighted up the areas that sonic was in each timestep. It is based on the render script I mentioned in my first tooling post and iterates through a single replay file or a whole directory of them. Every 5 replay files it churns through, it saves the image in case you don’t want to render each one. In the end I ended up with something like this:
Here is the code:
you should be able to use it on a single
.bk2 or a whole directory of them:
python3 ./scripts/map-paths.py ./results/jerk-agentv11/bk2/
This script has gone through a couple iterations, but still has pretty poor performance ( from my limited python knowledge), so please let me know if you have some suggestions and I will update. I also didn’t see a way to get the scenario while replaying, so for now the level map is hardcoded but ideally it would have all of the level maps and use the appropriate one for the files it was reading.
Other Tooling Improvements
While on the topic of tooling I also updated my
There are a couple minor improvements to note here:
- I use a variable for the tag name, so I only have to use it once, and it is when I call it from the command line.
- There is now a million timestep limit, just like the real contest execution, so my program will stop at the correct time.
- All of my results are stored in directories per their version tag.
- The agent that the evaluation uses is copied over to the results directory and committed to git with the tag it was used on. No more running an agent and forgetting what the code looked like!
Reading the logs
I didn’t have a good idea from running locally if I was making improvements, other than watching the replays and seeing if Sonic mostly made it to the end or not. While running the contest locally, two CSVs get created as the agent runs, a
log.csv and a
log.csv looks like this:
1000,4.046873092651367 2000,7.72090220451355 3000,11.31653642654419 4000,14.871755599975586 5000,18.917137384414673 6000,23.270551919937134 7000,26.892014980316162 8000,30.42851948738098 9000,33.96245861053467
and only records the timesteps, and how many seconds the agent has been running. It can be useful for figuring out how long your agent has been running, or how long it has to go, but not much else.
monitor.csv on the other hand records the reward and number of timesteps that each episode achieved, along with the wall clock time of finishing. This seemed really useful for seeing if agents were improving over time, or if they were not getting better at all. Armed with almost no python knowledge I made this plot script to visualize the data. Suggestions welcome!
I found this pretty interesting. It was easy to tell when I was exploiting too much, the reward would stay pretty low, while most episodes timed out:
A better balance where the pattern of exploiting more over time becomes more apparent with rewards that generally get higher, and short episode lengths:
Making those plots was super helpful for understanding how the agent was behaving, but not as useful in helping me decide which ones were performing the best and might be scored higher in the contest. To find the total reward for an agent, I made another script that averages all of the runs:
This one takes in the tag name I want to see the reward for and prints out how close the agent is to completion as well as the current average score:
$ python3 ./scripts/calc_reward.py jerk-agentv12 99.900000% done, reward: 6970.131268
While Sonic runs
I also changed up the logging output of my agent. Instead of finding out when a solution is replayed, or backtracking occurs, at the beginning of each episode I see the percentage complete and my run’s score.
I do that with an addition of the following lines in my main while loop:
while True: if new_ep: if (solutions): current = [np.mean(x) for x in solutions] print('%f%% done, reward: %f' % (env.total_steps_ever/10000, np.mean(current))) ...
Thanks for reading! You might be interested in the rest of this series:
- Day 1: Getting the Basics Set Up
- Day 3: Running the Jerk Agent
- Days 4 & 5: Getting TensorFlow & Docker to work on my MacBook
- Day 6: Playback Tooling for
- Days 9 &10: Failing with the Rainbow DQN baseline code.
- Days 11–14: Reading the PPO2 code
- Days 16–18: Running the PPO2 baseline code, and failing at TensorFlow & Docker optimization.
- Days 22–25: A Deep Dive into the Jerk Agent
- Days 26–29: Visualizing batches of sonic runs
- Days 38–53: Discovering Q-Learning
- My final submission: the improved JERK agent