OpenAI Retro Contest Day 3

Alternate title: running commands until they work.

Day 3 for the OpenAI Retro Contest started off with just buying the actual game from Steam, since all of the previous day’s work was devoted to avoid a $5 fee. With that hurdle solved, the rest of the official instructions were pretty easy to blindly follow, though that would come back to bite us. (It was awesome to see the docs embed our credentials into the instructions, since that is a touch you don’t even see much for professional API documentation.) After installing Docker, waiting a loooong time for Docker things to download, upload, build, or whatever Docker things Docker-do we were able to get our random agent evaluated:

At this point 2 teams had something better than a random agent submitted

11th place isn’t bad for just following the directions! This really boosted team spirits, so we got right back into it to see if we could start understanding what we just did.

One of main things we realized that we didn’t understand was how to play Sonic. I had played this version of Sonic as a young child, but that was some time ago, so we fired up OpenEmu with our totally legit rom and played the first level. Turns out the main mechanic I thought we should be using, spin dash, isn’t even in the game, and who knows if we will end up buying the other versions. After a few more levels, it was pretty clear that any kind of naive strategy around just going right and jumping was probably not going to cut it.

JERK it up

With our new knowledge, and break time over, we continued our learning journey with Gotta Learn Fast which was super helpful, despite its spelling mistakes, and attempting to implement some of the other baseline implementations. The jerk agent (Just Enough Retained Knowledge) seemed simple enough in its code that we could understand how it works since it ultimately was a slightly more advanced random controls.

We download the agent file and swap out the remote environment for the one hosted locally in our retro package.

Then we started it up! Sadly our excitement was kind of cut short. The output of our terminal was just saying the same thing over and over again:

backtracking due to negative reward: 0.000000
backtracking due to negative reward: 0.000000
backtracking due to negative reward: 0.000000
...

Which didn’t really feel like success. Our next thought was to stick a render command into the loop so we could see what was happening, so we added an env.render() into our main loop. The output wasn’t pretty: one frame for every few game seconds of what looked like sonic constantly going backwards. It was then that I learned that the backtracking wasn’t some kind of fancy AI term, but literally Sonic moving backwards. Back to the code.

In the jerk agent, the main loop moves a bit to the right, and then if the reward from the moving isn’t greater than zero, it moves a little bit to the left.

In our case it seemed like the reward we were getting from move was nothing, so we took a look at that function:

The reward being returned here, was actually an integral or sum total of the reward from each action we were taking. Since the sum for 100 actions wasn’t greater than zero, I didn’t have high hopes for what it might be like for an individual action. We didn’t really know what this reward was all about, so we took a look at env.step()

Turns out, that returns the reward from self.compute_step() which didn’t really help us at all.

After another hour or so pause, where we poked at some of the other variables, we found some of the documentation about the scenario file and decided that we needed one of those. The documentation was kind of useless without an example (PR #19) so after wasting too much time, we eventually found a scenario file in the game data that seemed to be what we wanted.

{
"done": {
"variables": {
"lives": {
"op": "zero"
}
}
},
"reward": {
"variables": {
"score": {
"reward": 10.0
}
}
}
}

This seemed to match up with our understanding of the game play, so we defined a new reward function that would reward us for getting closer to finishing the zone, which is pretty easy for Sonic, since the goal is pretty much to go all the way to the right. From the data.json, we had a couple variables that we could use: x, screen_x, and the usual score, points, etc. We were not really sure what the difference, or even definitions of x vs. screen_x so we included them both.

{
"done": {
"variables": {
"lives": {
"op": "zero"
}
}
},
"reward": {
"variables": {
"x": {
"reward": 75.0
},
"screen_x": {
"reward": 100.0
}
}
}
}

And when we gave that a try,

Always go right!

We got a Sonic who felt rewarded for making progress in the game! This was another celebratory period but we quickly fell back on the hedonic treadmill and wanted to get Sonic past that darn loop.

The contest.json made it seem like you could make even more complicated reward functions with lua scripts, but we couldn’t get that to work and gave up quickly. We did however figure out that we could move our env.render() into our move function and that made Sonic waaaaay more enjoyable to watch and the cases of backtracking more apparent:

Now that we kind of knew what we were doing finally, we started tweaking some of the parameters of our jerk agent such as: how often to jump, how far to backtrack, how many moves to make before backtracking, etc. After a bit more tweaking and a whole lot of watching Sonic jump around a loop, we saw our first win:

This time, we were celebrating a bit too much already and needed to call it quits for the day so that we could be putting more focus on celebrating.

Beating ourselves into submission

The logical capstone to a day is submitting your agent for evaluation, so that is what we ended up spending the next half of our day doing. Our near zero conceptual and practical knowledge of Docker was not an asset here. We started off by changing our agent to use the remote environment, and tried running the provided commands again. Here are just some of the issues we ran into, while trying to start our victory celebrations.

  • Couldn’t log in docker. Luckily Ben is way better at knowing when to google an answer, so we figure out that we were running into this issue and restarted Docker.
  • Eventually we got everything packaged up and off to the evaluation server… but it errored out immediately. We decided we needed to try to run it locally first to see if it worked.
  • Local execution errored too, so I guess that is progress.
  • Went back and forth quite a bit about which env to run, since we kept getting a socket connectivity issue.
  • We didn’t realize that you had to build each time, so we spent a while making changes to our code and docker file, but not building, and not seeing any updated results
  • retro-contest --help is totally worthless, but I can’t find the source to make a pull request. Turns out the undocumented --use-host-data command is essential if you want to get sonic to run.
  • Docker wants to use the sonic data from python installed in your user directory. That is not where I have python installed, but since I don’t know where to change that config, I just copied it to where Docker wants it to be.

Eventually after a couple hours of what seemed like something pretty simple, we sort of got it working locally, and submitted that copy. It was totally worth it though, because an hour of evaluation later:

Looks like we were middle of the pack as far as jerks go.

🎉Second Place 🎉

Now we were totally done for the night, but still had some open questions:

  • What is the observer variable or info that gets returned from env.step()?
  • How to or even if we should make scripted reward functions?

Next up is trying to get the better performing baseline functions submitted, oh and celebrating.