Retro Gym with Baselines: 4 Basic Usage Tips

AurelianTactics
aureliantactics
Published in
3 min readDec 9, 2018

A short summary and code example followed by explanations.

  1. Retro Wrappers
#import common wrappers
from baselines.common.retro_wrappers import *
#define environment
env = retro.make(game, state, scenario)
#wrap environment
env = MovieRecord(env,save_dir)
env = StochasticFrameSkip(env,stochastic_frame_skip,skip_prob)

2. Custom Models

3. Parallel Runs and Multiple Environments (env)

Parallel runs are called from command line using MPI:

mpirun -np 8 python -m baselines.ppo1.run_atari

Multiple env can be called from the command line or fed into the algorithm. Command line:

python -m baselines.run --alg=ppo2 --env=CartPole-v1 --num_env=2

Can combine the two:

mpirun -n 2 python -m baselines.run --alg=ppo2 --env=CartPole-v1 --normalize_observations=True --num_env=2

Customizing multiple env:

4. TensorBoard

#in the python file that you call from the command line:import osos.environ['OPENAI_LOG_FORMAT']='stdout,log,csv,tensorboard'

I also had to modify run.py in the baselines package (see this issue). Launch TensorBoard from the command line:

tensorboard --logdir=...

A messy version of a python file running these four modifications can be found here. I will have a clean version up in new repo with some concrete examples in a couple of weeks.

Explanation

OpenAI’s gym is a popular Reinforcement Learning (RL) package. Retro is an extension of that framework for classic video games like the Sega Genesis and Super Nintendo. In some previous posts, I wrote about the OpenAI Retro Contest and my attempts to beat Sonic the Hedgehog using RL. Since that time, the RL package OpenAI Baselines has undergone some significant updates. This post has a few usage tips for using Retro with Baselines. Upcoming posts will have a bit more on the experiment set up, some results, and some analysis.

Retro Wrappers

In RL an agent interacts with an environment (env). The agent observes states, takes, actions, and generally tries to maximize reward. In gym the basic environment action is:

observation, reward, done, info = env.step(action)

The RL agent selects the action, feeds it into env.step and gets a new observation, reward, done (ie is the episode or game over), and miscellaneous info. Wrappers customize and streamline this process using common methods. Wrap the env in a wrapper, and the env gains new functionality. For example, in the original Deep Q Network (playing Atari games with RL) paper the observation space was the last 4 images of the game. The FrameStack wrapper make stacking those images easier:

env = FrameStack(env,4)

The Retro contest code that OpenAI provided had a number of wrappers that have since been added to Baselines. Those that followed along with my old Sonic code may have noticed that I called the algorithm in one file, called a make_env function imported from another file, and wrapped the env in two different places with wrappers from multiple packages. Since the Retro wrappers have been added to baselines, this process has been streamlined and can all be done in one file. The make_env function in that file takes in arguments from argparser and passes them to various wrappers, allowing me to customize my runs. Some examples:

#save a video every k episodes
env = MovieRecord(env,savedir,k=10)
#Frame skip (hold an action for this many frames) and sticky actions
env = StochasticFrameSkip(env,4,skip_prob)
#scale and turn RGB image to grayscale
env = WarpFrame(env,width=112,height=128,grayscale=True)

See the atari_wrappers.py and retro_wrappers.py files for more wrappers. The SonicDiscretizer is a must for any retro game where you want to customize the action space.

Custom Models

Baselines comes with a number of prebuilt models. The gist I shared above shows a simple example of making your own model, registering the model, and passing the model to the algorithm. I hope to have more on this in a later post.

MPI and Multiple Environments

Reinforcement learning can be slow. Running multiple environments or multiple process with MPI (or both) can speed things up. Which is best for your algorithm and set up? Try them both and profile. For more info, one of the main contributors to Baselines has some good explanations of multiple env versus MPI here and here.

Coming soon! Level 3 is surprisingly hard.

--

--