Developing Reinforcement Learning Environment Using OpenAI Gym

Published in

Geek Culture

7 min readDec 27, 2021

Introduction

Reinforcement Learning problems consist of the agent and the environment. The environment provides feedback to the agent so that it can learn which action is appropriate for a specific state. In this post, we’re going to build a reinforcement learning environment that can be used to train an agent using OpenAI Gym.

This tutorial is divided into 2 parts. In the first part, we’re going to build a simple game using Python and PyGLET. Then, we’re going to adapt the OpenAI Gym interface to standardize our environment so that when we develop a learning algorithm, we don’t have to understand how it works internally.

The Game

We’re going to build a maze game that has 2 simple rules:

The agent can move 1 step at a time in 4 directions: Up, Down, Left, and Right.
The game is over when the agent reaches the goal.

The figure below illustrates how the game looks like.

Game Development

This section briefly explains the game development part of the project. Since game development isn’t the main focus of this post, we’re not going to dig into the detail of it. However, if you’re interested, the code of this project is available in this GitHub repository.

Business Logic

First, we’re going to create a Maze class that represents a maze and encapsulates the rules stated above. The state of a maze is represented by 3 things: the agent’s position, the goal’s position, and walls positions. These are going to be attributes of the class.

The only thing that can change in the maze is the agent’s position. Therefore, the class will have themove_robot method that takes a direction as a parameter. In addition, we’re going to have a few getter methods to check the state of the maze such as the distance between the agent to goal. These methods will come in handy when we build a reward function later on.

Presentation layer

Once the game logic is complete, the next step is to develop the presentation layer. The library we’re going to use for this layer is a Python game development library called PyGLET.

First of all, we’re going to create a MazeDrawer class responsible for making an image representing the current state of the maze. Then this image is passed to a Renderer to render it to the graphical user interface. Finally, all components are assembled in the MazeGame class which controls the main loop as well as maps the keypress to the action.

The figure below shows how all components fit together.

The architecture of the game. (Image by author)

Incorporate OpenAI Gym

Although the game is ready, there is a little problem that needed to be addressed first. To develop a model, the users still have to understand the mechanism of our game so they can make the learning algorithm interact with it. This is problematic because we want our users to focus on solving the problem not learning how the system works. Moreover, if at some point, we made a change to our game, it could ruin all of their works. Sounds awful, right?

To prevent this, we’re going to standardize the interface of our game so that no matter how it works internally, it looks the same for users and the standard we’re going to apply is designed by OpenAI Gym.

What is OpenAI Gym?

OpenAI Gym is a toolkit for reinforcement learning algorithms development. The library comes with a collection of environments for well-known reinforcement learning problems such as CartPole and MountainCar. Having these out-of-the-box allows developers to focus solely on learning algorithms and models.

In addition to the built-in environments, OpenAI Gym also allows creating a user-defined environment by simply extending the provided abstraction of the Env class.

OpenAI Gym Interface

To build a custom OpenAI Gym Environment, you have to extend the Env class the library provides like this:

import gymclass ImageMazeEnv(gym.Env):
     def __init__(self):
         ...     def step(self, action):
         ...     def reset(self):
         ...     def render(self):
         ...     def close(self):
         ...

Then, you need to override 2 attributes and 4 methods which function as follow:

Attributes
- action_space: All available actions the agent can perform.
- observation_space: Structure of the observation.
Methods
- step: Perform an action to the environment then return the state of the env, the reward of the action, and whether the episode is finished.
- reset: Reset the state of the environment then return the initial state.
- render(optional): Render the environment for visualization.
- close(optional): Perform cleanup.

Note that all the code related to this must be in an envs folder inside your project directory.

Action and Observation Space

The action space is straightforward. There are 4 available actions: Left, Right, Up, and Down. We can define it using Discrete class provided for discrete space.

The observation space defines how we want the agent to perceive the environment. Since we already implemented the MazeDrawer for generating the image of the maze, we’re going to use that image as an observation of our environment. To define this we can use theBoxclass which allows you to specify the shape of observation and its value range.

The action and observation space are defined as follow:

import gym
from gym.spaces import Discrete, Boxclass ImageMazeEnv(gym.Env):
     def __init__(self):
        self.maze = .... # Create a maze object
        ....
        self.action_space = Discrete(4)
        self.observation_space = Box(low=0,high=255,shape=[500,500])

The step function

After we’ve defined the action and observation space, the next is to implement the step function. This function has 3 responsibilities: perform an action, provide observation for the new state, and provide a reward.

The first twos are trivial since we’ve made it already. The Maze class already has the move_robot method, so we just have to pass the action to it. For the observation, we’ve implemented MazeDrawer which can make an image representing the current state of a maze object. The remaining work is to design and implement the reward function which is a crucial part of developing the learning algorithm.

The reward mechanism should relate to the goal we want to achieve. In the maze problem, the mission is to reach the goal which the agent doesn’t know where it is. We’re going to use the reward function as a signal by providing a positive reward every time the agent moves closer to the goal and vice versa. Moreover, the agent should avoid hitting the wall because it’s useless, so we’re going to add the penalty to that as well. The reward function for the maze problem has the following conditions:

If the agent moves closer to the goal(compare to the previous turn), it gets a reward +1.
If the agent moves away from the goal, it gets a penalty -1.
If the agent hits the wall, it gets a penalty -10.
If the agent reaches the goal, it gets a reward +100.
Distance between the agent and the goal is computed using Euclidean distance without considering the wall.
If the agent didn’t reach the goal within a specific number of turns, the game is over.

Below is how the step function looks like:

def step(self, action):
   self.timestep += 1   current_dist_to_goal = self.maze.dist_to_goal()
   self.maze.move_robot(action)
   new_dist_to_goal = self.maze.dist_to_goal()
   is_collide = current_dist_to_goal == new_dist_to_goal   self.done = self.maze.is_robot_reach_goal() or self.timestep ==     self.time_limit   reward = 0.
   if self.maze.is_robot_reach_goal():
      reward = 100
   elif is_collide:
      reward = -10
   elif new_dist_to_goal < current_dist_to_goal:
      reward = 1
   else:
      reward = -1   return MazeDrawer.draw_maze(self.maze, 500, 500), reward.value, self.done, {}

Reset, Render, and Close

The reset function has to re-initialize the game to the starting state and return the observation. For simplicity, we’re going to store the initial state in a JSON file and create a new Maze instance out of it. Similar to the step function, the MazeDrawer is used to generate observation.

The render function renders the environment so we can visualize it. We’ll do this using MazeDrawer and Renderer classes. The close method is for clean-up. The only thing we have to do here is just destroying the Renderer object.

The implementation of these 3 methods are as follow:

def reset(self):
   self.maze = Maze.from_config(self.config_file)
   self.done = False
   self.timestep = 0
   
   if self.visualize:
     self.renderer.render(MazeDrawer.draw_maze(self.maze, 500, 500))   return MazeDrawer.draw_maze(self.maze, 500, 500)def render(self, mode='human'):
   self.renderer.render(MazeDrawer.draw_maze(self.maze, 500, 500))def close(self):
   if self.visualize:
      self.renderer.close()

Register and use the environment

Now that our environment is ready, the last thing to do is to register it to OpenAI Gym environment registry. Here’s how to do it:

First, create a class representing a specific version of your environment.

class ImageMazeV0(ImageMazeEnv):
     def __init__(self):
         super().__init__(time_limit=200)

Then, register it to gym register by putting the following code into __init__.py of your project directory.

from gym.envs.registration import registerregister(
   id='ImageMaze-v0',
   entry_point='gym_image_maze.envs:ImageMazeV0'
)

Finally, add an import statement to __init__.py in your envs directory.

from gym_image_maze.envs.image_maze_env import ImageMazeV0

The new environment is ready !!! Let’s test it with the following code:

import gymif __name__=='__main__':
   env = gym.make('ImageMaze-v0')
   env.reset()
   for i in range(500):
       env.render()
       observation, reward, done, _ = env.step(env.action_space.sample())
       print('Observation : ' + str(observation.shape))
       print('Reward      : ' + str(reward))
       print('Done        : ' + str(done))
       print('---------------------')
env.close()

You should see something like this on the screen and terminal.

---------------------
Observation : (500, 500)
Reward      : -10
Done        : False
---------------------
Observation : (500, 500)
Reward      : 1
Done        : False
---------------------
Observation : (500, 500)
Reward      : 1
Done        : False
---------------------

That’s it. We’re done. Now our environment is ready to use.
If you want to see the code of this project, it is available here.

Conclusion

In this article, we’ve learned how to build a custom reinforcement learning algorithm using OpenAI Gym. This allows you to create reinforcement learning problems that are tailored to your specific use cases which are not in the standard environment provided by OpenAI Gym.

Thanks for reading. If you like this article, you can follow me on Medium to check out for more. See you in the next post.