Create a gymnasium custom environment (Part 2)

Yuki Minai
5 min readMar 4, 2024

gymnasium packages contain a list of environments to test our Reinforcement Learning (RL) algorithm. For example, this previous blog used FrozenLake environment to test a TD-lerning method. While these environments are great testbeds, we often want to customize the provided environment to see how an agent behaves in different environments. It is also a great interest to create our custom environment and test our algorithm.

gymnasium provides an easy way to do them. In this series of blogs, we will learn

  • How to edit an existing environment in gymnasium (last blog)
  • How to create a custom environment with gymnasium (this blog)

In this blog, we will create a fun environment to play Pokemon Red Game. This is motivated by this cool work by Peter Whidden and [another work](https://github.com/Baekalfen/PyBoy) by Asger Anders Lund Hansen, Mads Ynddal, and Troels Ynddal. The code is mainly adapted from Peter’s git repository but simplified to convey the key points to define a custom environment.

4 essential functions to define a custom environment

As reviewed in the previous blog, a gymnasium environment has four key functions listed below (obstained from official documentation).

- reset() : Resets the environment to an initial state, required before calling step. Returns the first agent observation for an episode and information, i.e. metrics, debug info.

- step() : Updates an environment with actions returning the next agent observation, the reward for taking that actions, if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.

- render() : Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

- close() : Closes the environment, important when external software is used, i.e. pygame for rendering, databases

When designing a custom environment, we inherit “Env” class of gymnasium. Then, we redefine these four functions based on our needs. Inheriting “Env” class is crucial because it:

  • provides access to a rich set of base functionalities and utilities within the Gymnasium library, such as methods for seeding randomness.
  • ensures that the custom environment adheres to the Gymnasium framework’s standardized interface, allowing it to be used interchangeably with other Gym environments.
  • facilitates the integration with other Gymnasium tools and plugins, enhancing the environment’s capabilities and simplifying the development and testing process.

By inheriting from the Env class, we can focus on defining the unique aspects of our custom environment such as its observation space, action space, and dynamics, while leveraging the established infrastructure provided by Gymnasium for simulation control, rendering, and interaction with learning algorithms.

Pokemon Red Game environment

Let’s start defining the pokemon environment.

To create a Pokemon Red Game environment, we use a python based game boy emulator called PyBoy.

In the Pokemon Red Game environment, there are 7 commands (i.e. action) the agent can use to explore the world:

  • Press arrow up
  • Press arrow down
  • Press arrow right
  • Press arrow left
  • Press A botton
  • Press B botton
  • Press start botton

These are the same commands we can use to play Pokemon Red! The state an agent can be in is defined by the game map. The observed state is a 144x160x3 grid of images (i.e. one 36x40 grid image for RGB). For reward function, we can design it as we want. For this tutorial, let’s define a reward as a sum of levels of all pokemons caught so far for simplicity. (Note that we need a more sophisticated reward function to train an agent to play pokemon red.)

Initialize the environment

As I mentioned above, we will create our class by inheriting Env class of gymnausium. Then, we will implement four essential functions, reset, step, render, and close for our new custom class. Before defining these functions, we will first learn the implementation of initialization. This initialization process is invoked when the environment is first created. This process establishes the key characteristics of the environment.

During initialization, we define several critical aspects:

  • Action space: A set of all possible actions that an agent can take in the environment. It’s a way to outline what actions are available for the agent to choose from at any given step
  • Observation space: A size or shape of the observations that the agent receives from the environment. Essentially, it describes the form and structure of the data the agent uses to make decisions
  • Action frequency: A number of frames before a new action is taken. In the context of PyBoy, an action can be applied once every 24 frames. This setting controls the pace at which the agent can act within the game environment
  • Pyboy object: An object to interface with the actual game environment provided by PyBoy. It acts as the bridge between our custom gymnausium environment and the Game Boy game we aim to interact with
  • Initial state: A starting state of the agent when the environment is initialized. For the purpose of this tutorial, we will set the initial state to be the moment after choosing the first pokemon, as demonstrated in Peter Whidden’s work

Render the environment

Next, we will define a render function. This function returns the pixel values of the game screen at any given moment. By default, the screen pixel size in PyBoy is set to (144, 160, 3), representing the resolution and color depth (RGB) of the Game Boy’s display.

Reset the environment

Next, we will define a reset function. When we run multiple episodes of simulation, we call reset function at the beginning of each episode to reset the environment to a predefined initial state. Note that initialization function (__init__) is called when the environment is created only once when the environment is first created. After that, at the beginning of each new episode, reset function will be called for initialization of the environment, ensuring that each episode starts from a consistent state.

Within this function, for our specific case, we will initialize the state and the total reward value as follows:

Take a step in the environment

Next, we will define step function. We pass an action as its argument. This function moves the agent based on the specified action and returns the new state, obtained reward, and whether the episode is terminated/truncated. For simplicity, we don’t consider the termination or truncation condition in this implementation. Thus, the episode is terminated when we stop the execution of this code.

Close the environment

Lastly, we will define a close function to ensure proper cleanup of any resources used during the simulation. We will inherit and use the close function from the parent class. Additionally, we will include code specifically designed to terminate the PyBoy session.

Integrate all functions and define a whole class

Let’s integrate all functions, define the whole RedGymEnv class, and test our implementation!

Visualize the current state

In the below code, after initializing the environment, we choose random action for 30 steps and visualize the pokemon game screen using render function.

Since we need to use Pokemon Rom file to run this environment, we cannot run it on kaggle. Here is how you can run the below code.

1. Download this blog

2. Legally obtain Pokemon Red ROM file (You can find this using google)

3. Download has_pokedex_nballs.state file from this github repository

4. Upload the below two path variables based on where each file is on your machine

5. Uncomment below cell

6. Ready to run!

After setting up your environment, you should be able to see something like below video.

It is working — RRed is now navigating the game screen in response to the action commands. With gymnasium, we’ve successfully created a custom environment for training RL agents.

In future blogs, I plan to use this environment for training RL agents. Stay tuned for updates and progress!

Reference

  • PokemonRedExperiments github repository by Peter Whidden (https://github.com/PWhiddy/PokemonRedExperiments)
  • PyBoy github repository (https://github.com/Baekalfen/PyBoy)

--

--

Yuki Minai

Ph.D. student in Neural Computation and Machine Learning at Carnegie Mellon University, Personal webpage: http://yukiminai.com