A brief introduction to Gymnasium

A reinforcement learning API standard with a wide range of reference environments

Vincent Le
4 min readMay 20, 2024

Keywords: reinforcement learning, environment, simulation

About Gymnasium

Gymnasium is a project that provides an API for all single-agent reinforcement learning settings. It includes implementations of typical environments such as Cart Pole, Pendulum, Mountain Car, Mujoco, Atari, and others.

The API has four core functions:

  1. make : Initializing environments
  2. step : Updates an environment with actions, returning the next agent observation and the reward for taking those actions.
  3. reset : Resets the environment to an initial state.
  4. render : Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, and “ansi” for text.

Now, we will demonstrate how to perform RL using Gymnasium.

Code Implementation

For demonstration, we will use the “Frozen Lake” game. The game involves crossing a frozen lake from start to goal without falling into any holes by walking over the frozen lake. The player may not always move in the intended direction due to the slippery nature of the frozen lake.

Description

The game starts with the player at location [0,0] of the frozen lake grid world, with the goal located at a far-reaching extent of the world, e.g., [3,3] for the 4x4 environment.

Holes in the ice are distributed in set locations when using a pre-determined map or in random locations when a random map is generated.

The player makes moves until they reach the goal or fall into a hole.

The lake is slippery (unless disabled), so the player may move perpendicular to the intended direction sometimes (see is_slippery).

Randomly generated worlds will always have a path to the goal.

Gymnasium

First, we start with installing and calling important libraries.

!pip install gymnasium pygame

import gymnasium as gym
At time t, the agent receives state St and reward Rt from the environment. The agent uses its policy to choose an action At. Once the action is executed, the environment transitions a step, providing the next state St+1 as well as feedback in the form of a reward Rt+1

First, an environment is created using make with an additional keyword "render_mode" that specifies how the environment should be visualised. See render for details on the default meaning of different render modes. In this example, we use an FrozenLake-v1environment where the agent controls a spaceship that needs to land safely.

env = gym.make('FrozenLake-v1', 
render_mode='human')
observation, info = env.reset()
print(f"The environment's observation space: {env.observation_space}")
print(f"The environment's action space: {env.action_space}")

"""
The environment's observation space: Discrete(16)
The environment's action space: Discrete(4)
"""

After initializing the environment, we reset the environment to get the first observation of the environment. For initializing the environment with a particular random seed or options (see environment documentation for possible values) use the seed or options parameters with reset.

observation, info = env.reset()
episode = 0
actions,rewards = [], []
for _ in range(1000):
action = env.action_space.sample() # agent policy that uses the observation and info
observation, reward, terminated, truncated, info = env.step(action)
actions.append(action)
rewards.append(reward)
if terminated or truncated:
observation, info = env.reset()
episode += 1

env.close()

print(episode)

"""
113: indicates that for the loop of 1000 rounds, the game suffers 113 episodes.
"""

Next, the agent takes an action in the environment, stepwhich can be imagined as moving a robot or pressing a button on a game controller to produce a change in the environment. As a result, the agent receives a new observation from the modified environment, as well as a reward for taking action. This reward could be favorable if you destroy an enemy or negative if you move into lava. A timestep is a specific type of action-observation interaction.

However, the environment may stop after a certain number of timesteps, which is referred to as the terminal state. For example, if the robot crashes or the agent completes a task, the environment must stop because the agent cannot continue. If the environment in the gymnasium has terminated, it is returned step. Similarly, we may wish the environment to terminate after a predetermined amount of timesteps; in this instance, the environment sends a truncated signal. If either terminated or truncated is True, call reset to resume the environment.

Thank you for reading this article; I hope it added something to your knowledge bank! Just before you leave:

👉 Be sure to clap and follow me. It would be a great motivation for me.

👉Follow me: Linkedin | Github

--

--