Making fun visuals, history maps and other tool improvements

Days 26–29 of the OpenAI Retro Contest

Tristan Sokol
May 3, 2018 · 5 min read

With my new deep understanding of the jerk agent, I had a ton of ideas of new agents to try out, but making small code changes and testing them was getting tiring with all the little steps and room for errors, so I decided to spend some time focusing on my tooling.

Image for post
Image for post
How did sonic get so high up?

Where Does Sonic spend his time?

I was playing around agents that mapped out the level using Sonic’s position and the pixel data for the observer, when my hopes were destroyed when I remembered that your agents do not get the info variable in the contest environment. I decided that my work could still be put to good use on the masses of .bk2 files that a run generates. I started with an image of the level map, and then lighted up the areas that sonic was in each timestep. It is based on the render script I mentioned in my first tooling post and iterates through a single replay file or a whole directory of them. Every 5 replay files it churns through, it saves the image in case you don’t want to render each one. In the end I ended up with something like this:

Image for post
Image for post
after 20 or so runs
Image for post
Image for post
after 200 runs

Here is the code:

you should be able to use it on a single .bk2 or a whole directory of them:

python3 ./scripts/ ./results/jerk-agentv11/bk2/

This script has gone through a couple iterations, but still has pretty poor performance ( from my limited python knowledge), so please let me know if you have some suggestions and I will update. I also didn’t see a way to get the scenario while replaying, so for now the level map is hardcoded but ideally it would have all of the level maps and use the appropriate one for the files it was reading.

Other Tooling Improvements

While on the topic of tooling I also updated my script:

There are a couple minor improvements to note here:

  • I use a variable for the tag name, so I only have to use it once, and it is when I call it from the command line.
  • There is now a million timestep limit, just like the real contest execution, so my program will stop at the correct time.
  • All of my results are stored in directories per their version tag.
  • The agent that the evaluation uses is copied over to the results directory and committed to git with the tag it was used on. No more running an agent and forgetting what the code looked like!

Reading the logs

I didn’t have a good idea from running locally if I was making improvements, other than watching the replays and seeing if Sonic mostly made it to the end or not. While running the contest locally, two CSVs get created as the agent runs, a log.csv and a monitor.csv. log.csv looks like this:


and only records the timesteps, and how many seconds the agent has been running. It can be useful for figuring out how long your agent has been running, or how long it has to go, but not much else. monitor.csv on the other hand records the reward and number of timesteps that each episode achieved, along with the wall clock time of finishing. This seemed really useful for seeing if agents were improving over time, or if they were not getting better at all. Armed with almost no python knowledge I made this plot script to visualize the data. Suggestions welcome!

I found this pretty interesting. It was easy to tell when I was exploiting too much, the reward would stay pretty low, while most episodes timed out:

Image for post
Image for post

A better balance where the pattern of exploiting more over time becomes more apparent with rewards that generally get higher, and short episode lengths:

Image for post
Image for post

Making those plots was super helpful for understanding how the agent was behaving, but not as useful in helping me decide which ones were performing the best and might be scored higher in the contest. To find the total reward for an agent, I made another script that averages all of the runs:

This one takes in the tag name I want to see the reward for and prints out how close the agent is to completion as well as the current average score:

$ python3 ./scripts/ jerk-agentv12
99.900000% done, reward: 6970.131268

While Sonic runs

I also changed up the logging output of my agent. Instead of finding out when a solution is replayed, or backtracking occurs, at the beginning of each episode I see the percentage complete and my run’s score.

Image for post
Image for post

I do that with an addition of the following lines in my main while loop:

while True:
if new_ep:
if (solutions):
current = [np.mean(x[0]) for x in solutions]
print('%f%% done, reward: %f' % (env.total_steps_ever/10000, np.mean(current)))

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store