Getting Started with OpenAI’s Gym for Reinforcement Learning

Amnah Ebrahim
5 min readFeb 27, 2023

OpenAI’s Gym is one of the most popular Reinforcement Learning tools in implementing and creating environments to train “agents”. It contains a wide range of environments that are considered benchmarks ranging from “MountainCar” to “LunarLander”.

OpenAI’sGym MountainCar environment

In this article, I will introduce the basics to reinforcement learning alongside the basic APIs of OpenAI Gym.

But first…What’s Reinforcement Learning?

Reinforcement learning is machine learning based approach that aims to teach an agent how to optimally “act” given a specific state in an environment. Through continously interacting with the environment, an agent begins to learn by interacting with it through trial and error.

Let’s look at the following diagram to understand further:

The agent makes an action based on it’s current state, to maximise it’s reward. This action takes the agent to a new state that produces an action with certain reward, etc…

The brain that enables the agent to maximise it’s reward or “expected return” through it’s actions is called a “policy”. Each RL loop outputs a sequence of state, action, reward and next state.

It’s also worth mentioning that the environment is observed either as an observation, or as states. States are complete description of the state of the world. Observation is a partial description of the state.

Why use OpenAI Gym?

OpenAI’s Gym or it’s successor Gymnasium, is an open source Python library utilised for the development of Reinforcement Learning (RL) Algorithms. It is also used to compare RL algorithms by providing an API to communicate between learning algorithms and environments. This library also provides

The Gym library provides two things:

  • An interface that allows anyone to create RL environments.
  • A standard set of environments compliant with Gym’s API (gym-control, atari, box2D…).

Note: Gymnasium is a fork of OpenAI’s Gym library by it’s maintainers (OpenAI handed over maintenance a few years ago to an outside team), and is where future maintenance will occur going forward.

For the sake of this article, I’ll be refering to Gym.

Installing OpenAI’s Gym:

One can install Gym through pip or conda for anaconda:

pip install gym

Basics of OpenAI’s Gym:

Environments:

The fundamental block of Gym is the Env class. This python class “make”s the environment that you’d like to train the agent in, acting as the simulation of the environment. In this tutorial, we will be importing the Pendulum classic control environment “Pendulum-v1”.

This environment reflects the “inverted pendulum swing up” based on the classic problem in control theory as shown below:

Gym’s Pendulum Environment
import gym

# First, let's define and create our environment called
env = gym.make("Pendulum-v1")

# Then we reset this environment
observation = env.reset()

# Observation and action space
observed_space = env.observation_space
action_space = env.action_space
print("The observation space:{}" .format(observed_space))
print("The action space: {}".format(action_space))
The observation space:Box([-1. -1. -8.], [1. 1. 8.], (3,), float32)
The action space: Box([-2.], [2.], (1,), float32)

First step, we usually reset the environment to it’s initial state, which returns an observation related to the initial state. For the sake of this tutorial, let’s allow the agent to interact with the environment for 10 steps or iterations.

The environment class of the pendulum contains information or attributes such asobservation_space and the action_space.

The observation_space defines the structure and simulated values of the observation of the state of the environment. The observation differs in each environment, one way to look at it is a screenshot of a current game. When looking at the pendulum documentation here, we observe that this is a vector form of shape (3,) representing the x-y coordinates of the pendulum’s free end and its angular velocity.

Observation Space for Pendulum

As for the action_space attribute, it describes the type and number of actions that can be applied to the environment. For this problem, the action applied to the pendulum is torque and it ranges from -2 to 2.

Allowing the agent to interact with the environment:

To do so, we loop through a function called “step” 10 times to allow the agent to perform random actions in the environment. The step function returns four variables for each step:

  1. observation: The observation of the state of the environment.
  2. reward: The reward that the agent gets from the environment after executing th step from the random action sampled(and given input to the step)
  3. done: Whether the episode has been terminated. If true, you may need to end the simulation or reset the environment to restart the episode. In the case of the code below, we restart the simulation.
  4. info: This provides additional information depending on the environment.


for _ in range(10):
# Take a random action
action = env.action_space.sample()
print("Action taken:", action)

# Do this action in the environment and get
# next_state, reward, done and info
observation, reward, done, info = env.step(action)

# If the game is done (in our case we land, crashed or timeout)
if done:
# Reset the environment
print("Environment is reset")
observation = env.reset()

Let’s observe the output:

Action taken: [0.24622191]
Action taken: [-1.4795591]
Action taken: [1.7690651]
Action taken: [-1.4082793]
Action taken: [-1.6358312]
Action taken: [-1.7722173]
Action taken: [1.2893023]
Action taken: [1.3323158]
Action taken: [1.7908044]
Action taken: [0.41281328]

All torque output actions are between the range of -2 and 2 as mentioned in the documentation.

Read more at:

https://www.gymlibrary.dev/environments/classic_control/pendulum/#description

https://www.velotio.com/engineering-blog/exploring-openai-gym#:~:text=According%20to%20the%20OpenAI%20Gym,has%20an%20environment%2Dagent%20arrangement.

--

--

Amnah Ebrahim

Electronics engineer passionate about electronics, machine learning, autonomous robotics, and natural language processing!