Making fun visuals, history maps and other tool improvements

Days 26–29 of the OpenAI Retro Contest

With my new deep understanding of the jerk agent, I had a ton of ideas of new agents to try out, but making small code changes and testing them was getting tiring with all the little steps and room for errors, so I decided to spend some time focusing on my tooling.

How did sonic get so high up?

Where Does Sonic spend his time?

I was playing around agents that mapped out the level using Sonic’s position and the pixel data for the observer, when my hopes were destroyed when I remembered that your agents do not get the info variable in the contest environment. I decided that my work could still be put to good use on the masses of .bk2 files that a run generates. I started with an image of the level map, and then lighted up the areas that sonic was in each timestep. It is based on the render script I mentioned in my first tooling post and iterates through a single replay file or a whole directory of them. Every 5 replay files it churns through, it saves the image in case you don’t want to render each one. In the end I ended up with something like this:

after 20 or so runs
after 200 runs

Here is the code:

you should be able to use it on a single .bk2 or a whole directory of them:

python3 ./scripts/map-paths.py ./results/jerk-agentv11/bk2/

This script has gone through a couple iterations, but still has pretty poor performance ( from my limited python knowledge), so please let me know if you have some suggestions and I will update. I also didn’t see a way to get the scenario while replaying, so for now the level map is hardcoded but ideally it would have all of the level maps and use the appropriate one for the files it was reading.

Other Tooling Improvements

local_evaluation.sh

While on the topic of tooling I also updated my local_evaluation.sh script:

There are a couple minor improvements to note here:

  • I use a variable for the tag name, so I only have to use it once, and it is when I call it from the command line.
  • There is now a million timestep limit, just like the real contest execution, so my program will stop at the correct time.
  • All of my results are stored in directories per their version tag.
  • The agent that the evaluation uses is copied over to the results directory and committed to git with the tag it was used on. No more running an agent and forgetting what the code looked like!

Reading the logs

I didn’t have a good idea from running locally if I was making improvements, other than watching the replays and seeing if Sonic mostly made it to the end or not. While running the contest locally, two CSVs get created as the agent runs, a log.csv and a monitor.csv. log.csv looks like this:

1000,4.046873092651367
2000,7.72090220451355
3000,11.31653642654419
4000,14.871755599975586
5000,18.917137384414673
6000,23.270551919937134
7000,26.892014980316162
8000,30.42851948738098
9000,33.96245861053467

and only records the timesteps, and how many seconds the agent has been running. It can be useful for figuring out how long your agent has been running, or how long it has to go, but not much else. monitor.csv on the other hand records the reward and number of timesteps that each episode achieved, along with the wall clock time of finishing. This seemed really useful for seeing if agents were improving over time, or if they were not getting better at all. Armed with almost no python knowledge I made this plot script to visualize the data. Suggestions welcome!

I found this pretty interesting. It was easy to tell when I was exploiting too much, the reward would stay pretty low, while most episodes timed out:

A better balance where the pattern of exploiting more over time becomes more apparent with rewards that generally get higher, and short episode lengths:

Making those plots was super helpful for understanding how the agent was behaving, but not as useful in helping me decide which ones were performing the best and might be scored higher in the contest. To find the total reward for an agent, I made another script that averages all of the runs:

This one takes in the tag name I want to see the reward for and prints out how close the agent is to completion as well as the current average score:

$ python3 ./scripts/calc_reward.py jerk-agentv12
99.900000% done, reward: 6970.131268

While Sonic runs

I also changed up the logging output of my agent. Instead of finding out when a solution is replayed, or backtracking occurs, at the beginning of each episode I see the percentage complete and my run’s score.

I do that with an addition of the following lines in my main while loop:

while True:
if new_ep:
if (solutions):
current = [np.mean(x[0]) for x in solutions]
print('%f%% done, reward: %f' % (env.total_steps_ever/10000, np.mean(current)))
...