How to create a custom OpenAI Gym environment? with codes

Creating a game environment in OpenAI-gym from scratch

Published in

Data Science in your pocket

5 min readJul 11, 2023

Photo by Karthik Balakrishnan on Unsplash

In my previous posts on reinforcement learning, I have used OpenAI Gym quite extensively for training in different gaming environments. But for real-world problems, you will need a new environment and not the pre-existing OpenAI Gym environments.

My debut book “LangChain in your Pocket” is out now

LangChain in your Pocket: Beginner's Guide to Building Generative AI Applications using LLMs

LangChain in your Pocket: Beginner's Guide to Building Generative AI Applications using LLMs eBook : Gupta, Mehul…

www.amazon.in

So, the question is

How to create a custom environment in OpenAI Gym?

But a bigger question is,

Why should you create an environment in OpenAI Gym?

Like in some of my previous tutorials, I designed the whole environment without using the OpenAI Gym framework, and it worked quite well. So, what's the need to use some framework? The answer is easy

Standardized interface: OpenAI Gym provides a standardized interface for interacting with environments, which makes it easier to compare and reproduce results across different algorithms and research papers. So, you can train and test different environments with different algos easily if everything follows same structure
Reproducibility and sharing: By creating an environment in OpenAI Gym, you can share it with the research community, enabling others to reproduce your results and build upon your work.
Some RL libraries like stable-baselines, RLlib, tf-agents, etc can be easily integrated with OpenAI-Gym environments and basic to advance RL algorithms can be used to train the agents with ease (without coding from scratch).

So, as we are clear with why we need this, let’s understand how to do it.

MazeGame-v0

We will register a grid-based Maze game environment in OpenAI Gym with the following features

Start and End point (green and red)
Agent (Blue)
Obstacles (black)

The goal is to reach from start to end point avoiding obstacles. To keep things easy, the reward system is naive i.e. 1 if the endpoint is reached else 0.
The action space includes 4 actions: Up, Down, Right, and Left while the observation space is nothing but a grid of the size rows x columns
We will be adding all these features to our environment and will be rendering using pygame.

To create a custom environment, we just need to override existing function signatures in the gym with our environment’s definition. These functions that we necessarily need to override are

__init__(): This functions initializes your environment with default values
reset(): This function is for resetting the environment to default settings
step(): This function executes how the environment will change once the agent takes an action. Usually, the reward function is also incorporated/called within step()
render(): For rendering the environment. We will be using pygame for rendering but you can simply print the environment as well.

Let’s get started now

Import required libraries

import gym
from gym import spaces
import numpy as np
import pygame

2. Define the game class (read comments for better understanding)

class MazeGameEnv(gym.Env):
    def __init__(self, maze):
        super(MazeGameEnv, self).__init__()
        self.maze = np.array(maze)  # Maze represented as a 2D numpy array
        self.start_pos = np.where(self.maze == 'S')  # Starting position
        self.goal_pos = np.where(self.maze == 'G')  # Goal position
        self.current_pos = self.start_pos #starting position is current posiiton of agent
        self.num_rows, self.num_cols = self.maze.shape

        # 4 possible actions: 0=up, 1=down, 2=left, 3=right
        self.action_space = spaces.Discrete(4)  

        # Observation space is grid of size:rows x columns
        self.observation_space = spaces.Tuple((spaces.Discrete(self.num_rows), spaces.Discrete(self.num_cols)))

        # Initialize Pygame
        pygame.init()
        self.cell_size = 125

        # setting display size
        self.screen = pygame.display.set_mode((self.num_cols * self.cell_size, self.num_rows * self.cell_size))

    def reset(self):
        self.current_pos = self.start_pos
        return self.current_pos

    def step(self, action):
        # Move the agent based on the selected action
        new_pos = np.array(self.current_pos)
        if action == 0:  # Up
            new_pos[0] -= 1
        elif action == 1:  # Down
            new_pos[0] += 1
        elif action == 2:  # Left
            new_pos[1] -= 1
        elif action == 3:  # Right
            new_pos[1] += 1

        # Check if the new position is valid
        if self._is_valid_position(new_pos):
            self.current_pos = new_pos

        # Reward function
        if np.array_equal(self.current_pos, self.goal_pos):
            reward = 1.0
            done = True
        else:
            reward = 0.0
            done = False

        return self.current_pos, reward, done, {}

    def _is_valid_position(self, pos)
        row, col = pos
   
        # If agent goes out of the grid
        if row < 0 or col < 0 or row >= self.num_rows or col >= self.num_cols:
            return False

        # If the agent hits an obstacle
        if self.maze[row, col] == '#':
            return False
        return True

    def render(self):
        # Clear the screen
        self.screen.fill((255, 255, 255))  

        # Draw env elements one cell at a time
        for row in range(self.num_rows):
            for col in range(self.num_cols):
                cell_left = col * self.cell_size
                cell_top = row * self.cell_size
            
                try:
                    print(np.array(self.current_pos)==np.array([row,col]).reshape(-1,1))
                except Exception as e:
                    print('Initial state')

                if self.maze[row, col] == '#':  # Obstacle
                    pygame.draw.rect(self.screen, (0, 0, 0), (cell_left, cell_top, self.cell_size, self.cell_size))
                elif self.maze[row, col] == 'S':  # Starting position
                    pygame.draw.rect(self.screen, (0, 255, 0), (cell_left, cell_top, self.cell_size, self.cell_size))
                elif self.maze[row, col] == 'G':  # Goal position
                    pygame.draw.rect(self.screen, (255, 0, 0), (cell_left, cell_top, self.cell_size, self.cell_size))

                if np.array_equal(np.array(self.current_pos), np.array([row, col]).reshape(-1,1)):  # Agent position
                    pygame.draw.rect(self.screen, (0, 0, 255), (cell_left, cell_top, self.cell_size, self.cell_size))

        pygame.display.update()  # Update the display

The above is easy to understand where

__init__(): Initiates required variablesa and gaming environment. You need to pass a 2D array with maze configs to initialize (will demonstrate).
reset(): Reset agent position to start position
step(): Updates agent’s position according to the action taken and provide reward
_is_valid_position(): To check whether action taken by agent is valid or not
render(): Render game environment using pygame by drawing elements for each cell by using nested loops. You can simply print the maze grid as well, no necessary requirement for pygame

3. Save the above class in Python script say mazegame.py

4. In a new script, import this class and register as gym env with the name ‘MazeGame-v0’. This can be any other name as well.

import gym
from mazegameimport MazeGameEnv

# Register the environment
gym.register(
    id='MazeGame-v0',
    entry_point='mazegame:MazeGameEnv', 
    kwargs={'maze': None} 
)

5. Time to load the environment

#Maze config

maze = [
    ['S', '', '.', '.'],
    ['.', '#', '.', '#'],
    ['.', '.', '.', '.'],
    ['#', '.', '#', 'G'],
]
# Test the environment
env = gym.make('MazeGame-v0',maze=maze)
obs = env.reset()
env.render()

done = False
while True:
    pygame.event.get()
    action = env.action_space.sample()  # Random action selection
    obs, reward, done, _ = env.step(action)
    env.render()
    print('Reward:', reward)
    print('Done:', done)

    pygame.time.wait(200)

It is very similar to loading any other pre-existing environment present in OpenAI Gym. Want to see the env you designed?

Do remember this will register your environment to your local system only and isn’t globally available. For global availability, you need to create a pull request to the gym repository.

That’s all for today, see you soon !!