Custom Reinforcement Learning Environment Usage for Ray, Stable Baselines 3, and Acme

Published in

aureliantactics

5 min readDec 4, 2021

Let’s say you want to apply a Reinforcement Learning (RL) algorithm to your problem. The are dozens of open sourced RL frameworks to choose from such as Stable Baselines 3 (SB3), Ray, and Acme. To use the algorithms in these frameworks, your problem likely needs to be coded as a custom RL environment (env). By interacting with your custom RL env, the algorithm will set up an agent to send actions to your env, producing states and rewards for your agent to train on.

http://www.incompleteideas.net/book/RLbook2020.pdf

Creating and Registering the Environment

A popular env format is OpenAI’s gym package. Creating a simple, custom gym env is straightforward. Inherit from the gym package and fill in some methods with the specifics of your problem. The link above has a simple example. The example I’ll use is a gym implementation of the kaggle competition for Hungry Geese. Hungry Geese is a snake-like game where you move your goose around to collect food and avoid other geese. kaggle competitions come with their own kaggle environments which aren’t typically compatible with RL frameworks. Here’s the repo of a gym implementation. I followed the basic steps from gym’s guidelines:

Created the repo structure and necessary files
Created a new env file and implemented the basic methods: init, reset, close, render, and step. If you were going to do your own custom env, you’d replace these methods with logic for your custom env (like the simple example) or wrap your custom env with a gym wrapper (like the Hungry Geese example).
init sets up the env with any optional configuration options you may want the problem to have. In Hungry Geese this is things like the number of geese, size of the board, number of pieces of food, etc.
reset begins a new episode for the env. The method should return an empty observation. Depending on the problem, you may have to set parts of the environment back to some initial state or new randomization. In Hungry Geese each reset starts the geese and food in random positions.
render method let’s you see a visual implementation of the env. This can be ignored with a pass statement. I find render helpful when debugging the env and to see how the agent performs.
step is where the environment takes in the action and produces a new observation, reward, done (bool indicating if the episode is over or not), and information (optional dictionary). In my case this is where the majority of the code took place as I made a function to process the reward based on the current state of the game and to turn the default kaggle env dictionary of observations into a multi-dimensional array so that I could use a CNN on it.

Installing and Running the Environment

Here’s a colab example. If you’re installing it locally I recommend creating a virtual or conda env (different type of environment) to manage your installation. The basic steps:

install dependencies
clone the repo, install, and register the custom env
run the env

# install dependencies
!pip install gym kaggle-environments# clone repo, install and register the env
!git clone https://github.com/AurelianTactics/gym_hungry_geese.git
!pip install -e gym_hungry_geese# run simple gym env
import gym
import gym_hungry_geesenum_timesteps = 100
env = gym.make('HungryGeese-v0')
env.reset()for _ in range(num_timesteps):
  action = env.action_space.sample()
  observation, rewards, done, info = env.step(action)
  if done:
    env.reset()

Connecting the Env to Stable Baselines 3

SB3 is compatible with any registered gym env. Once the Hungry Geese env is installed and registered with gym, you can connect it to SB3 simply by making it and passing it to the SB3 agent:

# install stable baselines 3
!pip install stable-baselines3[extra]# clone repo, install and register the env
!git clone https://github.com/AurelianTactics/gym_hungry_geese.git
!pip install -e gym_hungry_geeseimport gym
import gym_hungry_geese# In SB3 can use gym.make() to make a registered env
env = gym.make('HungryGeese-v0')# train on the envfrom stable_baselines3 import PPO
model = PPO('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=100)obs = env.reset()
for i in range(10):
  action, _state = model.predict(obs, deterministic=True)
  obs, reward, done, info = env.step(action)
  if done:
    obs = env.reset()

See the colab file for the full code.

Connecting the Env to Acme

Acme is slightly different since Acme is a DeepMind RL framework that uses the DeepMind env framework rather than the OpenAI gym framework. Acme provides some wrappers that turns a gym env into a dm_env. Here’s some example usage:

# import dependencies (see example for full list)
import acme
...
import gym
import gym_hungry_geese
import dm_env
from acme import wrappers# wrap the gym env to convert it to a deepmind env
def make_environment(evaluation: bool = False, task: str = 'HungryGeese-v0') -> dm_env.Environment: 
  del evaluation
  # Load the gym environment
  environment = gym.make(task)
  # Make sure the environment obeys the dm_env.Environment interface.
  environment = wrappers.GymWrapper(environment)
  # Clip the action returned by the agent to the environment spec.
  # environment = wrappers.CanonicalSpecWrapper(environment, clip=True)
  environment = wrappers.SinglePrecisionWrapper(environment)  return environment
# make the env and environment_spec
# dm env use obsservation, action and env specs while gym envs use
# observation and action spaceshungry_goose_dm_env = make_environment()
environment_spec = specs.make_environment_spec(hungry_goose_dm_env)

Then pass the env to the Acme algorithm:

agent = r2d2.R2D2(environment_spec,
network=network,
...
)loop = acme.EnvironmentLoop(hungry_goose_dm_env, agent)
loop.run(num_episodes=10)

See the colab file for the full code plus and example of a custom neural network.

Connecting the Env to Ray

Setup is a bit different in Ray. Ray warns you that the gym registry (except for the default OpenAI envs) is not compatible and offers some different ways of connecting the custom env. One method is to use Ray’s register function, pass the env to that register function, and then pass the newly registered env name to the Ray algorithm. Here’s a colab file using the custom Hungry Geese gym env as an example

# register the env
from ray.tune.registry import register_envdef env_creator(env_config):
  env = HungryGeeseEnv()
  return env  # return an env instanceregister_env("hungry_geese_env_for_ray", env_creator)# call the env while training with ray
from ray.rllib.agents import ppo

# Configure the algorithm.
config = {
    "env": "hungry_geese_env_for_ray",
    ...
}# train the agent
trainer = ppo.PPOTrainer(config=config)
while True:
  print(trainer.train())

See the colab file for the full code and an example of using a custom neural network. The main difference is that with Acme and SB3 when you register the env with gym, Acme and SB3 can find the env with a gym.make() call. In Ray you have to call the class that creates the env directly.