Create a gymnasium custom environment (Part 1)

Introduction

Yuki Minai
4 min readMar 4, 2024

gymnasium packages contain a list of environments to test our Reinforcement Learning (RL) algorithm. For example, this previous blog used FrozenLake environment to test a TD-lerning method. While these environments are great testbeds, we often want to customize the provided environment to see how an agent behaves in different environments. It is also a great interest to create our custom environment and test our algorithm.

gymnasium provides an easy way to do them. In this series of blogs, we will learn

  • How to edit an existing environment in gymnasium (this blog)
  • How to create a custom environment with gymnasium (next blog)

Basic structure of gymnasium environment

Let’s first explore what defines a gym environment.

Each gymnasium environment contains 4 main functions listed below (obtained from official documentation)

  • reset() : Resets the environment to the initial state, required before calling step. Returns the first agent observation for an episode and information, i.e. metrics, debug info.
  • step() : Updates an environment with actions returning the next agent observation, the reward for taking that actions, if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.
  • render() : Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.
  • close() : Closes the environment, important when external software is used, i.e. pygame for rendering, databases

These functions define the properties of the environment and how the environment responds to actions taken by the agent. When starting a new episode, reset() is called to initialize the environment. At each time step, when an agent takes an action, step() is called to encode the action and observe its consequences, such as changes in the agent’s state or any rewards received. render() function is used to visualize the current state of the environment when needed. Lastly, when the simulation is complete, close() is called to properly clean up any resources that were used.

Example environment: Fronzen Lake

Let’s understand these functions more with an example using FrozenLake environment.

The objective of an agent in this environment is to navigate through a grid world, starting from the initial cell and reaching the goal cell. Here, we are using a 4x4 grid map, and each cell falls into one of four different categories:

  • S (Start): This cell is where our agent begins its journey.
  • F (Frozen): These cells are safe for the agent to walk on.
  • H (Hole): These are hazardous cells, and if the agent falls into one, the episode terminates with a reward of 0.
  • G (Goal): Reaching this cell yields a reward of +1 for the agent.

From the starting cell, the agent has the option to move in four directions: up, left, down, or right. The agent’s task is to explore the grid world, making decisions at each time step to eventually reach the goal cell and collect a reward of +1.

In the below code, we can see an example of the agent randomly exploring this environment over 20 time steps.

With gymnausium’s predefined environment, it is very easy to run the simulation.

As you may have observed, the default behavior is for the agent to start from the top-left cell and aim to reach the bottom-right cell.

Let’s begin by adjusting the initial state of the environment. By default, in this environment, the agent always begins in the top-left corner. Now, let’s experiment with changing both the initial and goal locations. We’ll set our starting position to the bottom-right and our goal position to the top-left. To achieve this, we can utilize the ‘desc’ argument when creating the environment, which specifies the starting position.

Reference: The original code of FrozenLake is available here.

In this scenario, we can adjust our environment without modifying the original class and function simply by passing the appropriate argument. The gymnasium environment provides several options for modifying the environment. When beginning to use the Gymnasium environment, it’s a good idea to check their documentation to explore the kinds of modifications that are already available within the package.

Edit an existing environment

Next, let’s consider a scenario where the original function doesn’t provide the functionality we need. Here, we aim to modify the reward function. By default, an agent receives +1 reward upon reaching the goal and no other rewards. We will introduce a negative reward -0.1 as a penalty at each step to encourage the agent to reach the goal quickly. To do this, we inherit FrozenLakeEnv class in gymnasium and define a new step function.

We can see that the agent received the total reward of -2.0 over 20 steps (i.e. -0.1 penalty at each time step).

Like this example, we can easily customize the existing environment by inheriting the original gymnasium class. For example, even if we modify a step function within a parent environment class, we can keep using the original reset and render functions without changing anything.

Summary

In this blog, we learned the basic of gymnasium environment and how to customize them. In the next blog, we will learn how to create own customized environment using gymnasium!

--

--

Yuki Minai

Ph.D. student in Neural Computation and Machine Learning at Carnegie Mellon University, Personal webpage: http://yukiminai.com