Setting up a Reward Function in Retro Gym and Other Utilities

AurelianTactics
aureliantactics
Published in
2 min readJul 2, 2018

I’ve been attempting to clear all levels in the original Sonic the Hedgehog for Genesis using OpenAI’s retro gym to train a Reinforcement Learning (RL) agent. I’ll have more on that in some upcoming posts. In this post and in part two, I’ll share some of the utility scripts I’ve been using. Most of the scripts in these two posts are flexible enough that they can be used for multiple retro gym games (with a little editing). Here’s the github repo.

Setting up a Reward Function

To use your own reward function you have to make three modifications:

  • Modify your environment call to use your .json file with the scenario parameter
  • Create a .json file that tells your environment what .lua script to use
  • Create the lua script that has the reward function

For example the OpenAI Retro Contest used the ‘contest.json’ for their scenario and the ‘script.lua’ file for their contest reward.

Calling the .json file:

env = retro.make(game, state, scenario='contest',...)

Inside the .json file that calls the lua script:

{
“done”: {
“script”: “lua:contest_done”
},
“reward”: {
“script”: “lua:contest_reward”
},
“scripts”: [
“script.lua”
]
}

Example of a reward function inside the script.lua file:

function contest_reward()
frame_count = frame_count + 1
local progress = calc_progress(data)
local reward = (progress — prev_progress) * 9000
prev_progress = progress
--bonus for beating level quickly
if progress >= 1 then
reward = reward + (1 — clip(frame_count/frame_limit, 0, 1)) * 1000
end
return reward
end

Useful Scripts

To view your agent’s progress retro gym can record videos in .bk2 files. To watch them you can either convert the .bk2 files to .mp3 files (see the README of the retro gym github for a script) or use this handy script that renders the videos for you. I include a slightly modified version of this as render.py which allows you to input where you want the script to look for videos and which video number to start with in the command line. Modify the sleep/framerate parts to speed up or slow down the playback. Another useful script from that blog allows you to write Sonic’s trajectory to an image of the level.

If you plan on using human demonstrations to improve your agent, this repo is great. You can play through the levels on a keyboard and create files for training your agent. My next post will share the script I use to examine those human demonstrations and use those demonstrations with a customized reward function. I’ll also share some of my .lua reward functions.

--

--