Photo by Gilles Lambert on Unsplash

🏋️ Custom OpenAI Gym Environment

Kieran Fraser
8 min readOct 7, 2019

--

Quick example of how I developed a custom OpenAI Gym environment to help train and evaluate intelligent agents managing push-notifications 🔔

Content:

  1. Setting up the custom environment
  2. Packaging and testing
  3. Adding data
  4. Simulating pushed notifications
  5. Random agent
  6. Next steps

1. Setting up the custom environment

This is documented in the OpenAI Gym documentation. I’m just including this section for the sake of completeness. Also, one of my requirements for the custom gym environment was that others would be able to install and run the training, simulation and evaluation methods with minimum effort. So to keep it clean and simple, I created a fresh Anaconda virtual environment.

conda create -n push python=3.6
conda list
# Name Version Build Channel
certifi 2019.9.11 py36_0 conda-forge
cloudpickle 1.2.2 <pip>
future 0.17.1 <pip>
numpy 1.17.2 <pip>
pip 19.2.3 py36_0 conda-forge
pyglet 1.3.2 <pip>
python 3.6.7 he025d50_1005 conda-forge
scipy 1.3.1 <pip>
setuptools 41.2.0 py36_0 conda-forge
six 1.12.0 <pip>
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_0
wheel 0.33.6 py36_0 conda-forge
wincertstore 0.2 py36_1002 conda-forge

When I have all the necessary packages installed (including my OpenAI gym environment), I can simply share this virtual environment by creating an environment.yml file, ensuring there will be no package versioning issues when others go to play with the custom gym on their own machines.

Next, I installed the gym package:

pip install gym

The version installed was 0.14.0. Once complete, I used the OpenAI docs to create a skeleton custom gym environment, with minimal functionality. I wanted to ensure that my distribution method was sound first, so adding functionality came later. The directory structure was as follows:

project_dir
-> gym-push
-> __init__.py
-> envs
-> __init__.py
-> basic_env.py
-> README.md
-> setup.py

Gym-push is the name of my custom OpenAI Gym environment. The folder contains an envs directory which will hold details for each individual environment (yes, there can be more than one!) and an __init__.py file which is used to register each environment of the gym — essentially mapping an id to the entry point of the environment, as shown below. NB: the id must be in the format of name-v#.

__init__.pyfrom gym.envs.registration import registerregister(
id='basic-v0',
entry_point='gym_push.envs:Basic',
)

The setup.py file contains information for distributing the gym-push environment. I stipulate which packages the gym is dependent on by using the install_requires argument, which ensures those packages are installed before installing the custom gym package. NB: I also stipulate the version. Gym-push will also have data files included in the package (e.g. notifications saved as a csv file), so I include the package_data argument to allow for this.

setup.pyfrom setuptools import setup, find_packagessetup(name='gym-push',
packages=find_packages(),
include_package_data=True,
package_data={
'': ['*.csv', '*.npy'],
},
version='0.0.1',
install_requires=['gym', 'numpy', 'pandas', 'joblib']
)

The references are also included in the MANIFEST.in file (the web folder is created later when implementing a UI with eel).

include gym_push/envs/data/*.csv
recursive-include gym_push/envs/web *

Within the envs directory there is another __init__.py file which is used for importing environments into the gym from their individual class files.

__init__.pyfrom gym_push.envs.basic_env import Basic

I added a basic_env.py file which contains a skeleton environment — just made up of the required methods which simply prints the name of the method to the screen when called.

basic_env.pyimport gym
from gym import error, spaces, utils
from gym.utils import seeding
class Basic(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self):
print('init basic')

def step(self, action):
print('step')

def reset(self):
print('reset')

def render(self, mode='human'):
print('render')

def close(self):
print('close')

Now with the basic skeleton structure of the gym in place, I could install it as a package in my virtual environment by executing the following command from the project_dir:

pip install -e gym-push

Once installed, I could test locally:

>>> import gym
>>> custom_gym = gym.make('gym_push:basic-v0')
init basic
>>> custom_gym.step('some action')
step
>>> custom_gym.reset()
reset
>>> custom_gym.render()
render
>>> custom_gym.close()
close

2. Packaging and testing

With the skeleton gym set up and working locally, the next step was to make it distributable i.e. uploading the package to PyPi. I used Twine to accomplish this. First I created the distribution files by executing:

python setup.py sdist bdist_wheel

Then I uploaded the files (first to Test PyPi, then to PyPi):

twine upload --repository-url https://test.pypi.org/legacy/ dist/*
twine upload dist/*

Finally, to test that gym-push was correctly distributed, I created a new Anaconda virtual environment and tried to install the gym from PyPi and run it (essentially recreating the scenario of someone wanting to test out the gym for the first time with nothing set up or installed).

conda create -n test python=3.6
pip install gym-push

The results were identical to testing locally. Success. Next steps include building up the custom functionality of the gym.

3. Adding data

Gym-push, as part of its custom functionality, requires data. Specifically, notification data. I have a notifications.csv file containing notification and context features ready for inclusion in the gym.

First step is to place the notifications.csv file in a directory accessible to the gym. As basic_env.py is using the data, I created a data directory in the envs folder.

project_dir
-> gym-push
-> __init__.py
-> envs
-> __init__.py
-> basic_env.py
-> data
-> notifications.csv
-> README.md
-> setup.py

I then find the path of the data directory and use pandas to import notifications from the csv file.

basic_env.pydef __init__(self):
print('init basic')
dir_path = os.path.dirname(os.path.realpath(__file__))
# ------------ load data ------------------#
self.notifications = pd.read_csv(dir_path+
'/data/notifications.csv')
print('number of notifications loaded: ',
len(self.notifications))

Testing this yields the following:

>>> import gym
>>> custom_gym = gym.make('gym_push:basic-v0')
init basic
number of notifications loaded: 324

4. Simulating pushed notifications

The rest of this post will be outlining how I implemented the custom functionality of gym-push. The purpose of gym-push is to facilitate the training and evaluation of intelligent agents attempting to manage push-notifications on behalf of a user. Consider:

  • In a given moment, a person receives a push-notification made up of features such as message content (ticker text), the app that posted the message (e.g. Facebook, WhatsApp), the color the LED flashed, the vibration pattern that alerted the user etc.
  • Associated with this moment, there is some context. For example, the time of the day and the day of the week it was pushed to the device, the location it was received, noise levels, device battery status etc.
  • Additionally, there is also an action associated with the notification-context pair. In the simplest case, the action is whether the person receiving the notification opened it or dismissed it.

So, given that an agent knows the context of a person and the details of a notification being pushed at them, can it correctly identify whether or not to deliver a notification in a given context?

Correctly is a deliberately vague term here as it is dependent on a lot of factors. Again, taking the simplest case, assume correct means that an agent can accurately predict the engagement a person would have taken on a notification had they received it.

So how can gym-push help? It can simulate notifications being pushed at a person and also simulate how that person engages with them. Basically reconstructing the real-world problem (which we will encode into our basic_env.py). Understanding the problem is the first step toward a solution.

More importantly though, gym-push can also provide an interface allowing an intelligent agent to intercept the notifications pushed at a person (thus relieving them from distraction) and make a decision about whether/when to send them on, based on their context, previous engagements, cognitive health etc. Ideally, the result of this would be a higher overall Click-Through-Rate (CTR) and a happier person. I’ll encode this functionality into additional gym-push environments (detailed in a future post!).

So, as the basic environment already has the notification, context and action data loaded (notifications.csv contains the notification-context pairs as well as the action taken on the notification by the person), all that is left to do is to write the logic for moving through contexts and simulating the pushes and subsequent actions.

The method to do this is already outlined in the docs, it is the step method.

basic_env.pydef step(self, action):
reward = self.calculate_reward(action)
done = False
self.epoch = self.epoch + 1
if self.epoch == len(self.data):
done = True
info = {}

observation = {**self.notifications.iloc[self.epoch].to_dict(), **self.contexts.iloc[self.epoch].to_dict()}

return observation, reward, done, info

The step method takes an action as an argument and returns an observation, reward, finished-flag and info object. The action in this case is an agent’s decision to open or dismiss the current notification at epoch x. The reward is calculated by comparing the action taken by the agent with the action actually taken by the person e.g. if correct, return 1, if incorrect return -1. The epoch is updated and the next notification and context are set as the observation to be returned. The initial observation is returned by the reset method which is called before step. The info dictionary can contain additional details, but shouldn’t be used by an agent for making a decision.

In order to visualise the simulation, I used eel. I wanted something quick and having web design experience, I felt this was the simplest way to get off the ground. I added eel to the package requirements in setup.py and also added json-tricks as I had to convert the notification-context pairs from python dictionaries to json to be received by my javascript code.

To get eel up and running I added a new web directory which contained main.html and some other css/javascript files.

project_dir
-> gym-push
-> __init__.py
-> envs
-> __init__.py
-> basic_env.py
-> data
-> notifications.csv
-> web
-> main.html
-> css
-> main.css
-> js
-> main.js
-> README.md
-> setup.py

I then initialise eel in the __init__ method of basic_env.py…

__init__ method of basic_env.pyeel.init(dir_path+'/web')
eel.start('main.html', mode='chrome', block=False)
eel.sleep(3)
eel.initial_state(dumps({'notification': self.notifications.iloc[self.epoch].to_dict(), 'context':self.contexts.iloc[self.epoch].to_dict(),
'size':len(self.data)}))

..and simply send updated epoch/notification/context information to the UI every time the render method is called.

render method of basic_env.pyeel.render(dumps({
'notification': self.notifications.iloc[self.epoch].to_dict(),
'context':self.contexts.iloc[self.epoch].to_dict(),
':self.epoch
}))

Eel.render here references a method defined in the javascript file in the web directory.

main.jseel.expose(render);function render(jsonObj){
jsonObj = JSON.parse(jsonObj)
var currentNotification = jsonObj.notification
....
}

5. Random agent

With the environment is set up to simulate the push-notification problem and a UI to visualise it, the final step was to create an agent which could interact with the gym. For the sake of brevity, I will demonstrate a random agent with no intelligence interacting with the environment:

Random Agent
gym_push:basic-v0 environment

The performance metric measures how well the agent correctly predicted whether the person would dismiss or open a notification. The CTR is the ratio of opened notifications over total sent. As illustrated in the screenshot, the random agent performed as expected, with performance approximating 50%.

6. Next Steps

My next post will address creating a more advanced agent to interact with and manage the notifications — improving performance and CTR! I also hope to include more advanced environments with more realistic notifications and contexts e.g. with text content (an opportunity for some NLP!).

To install gym-push yourself:

pip install gym-push

I will be adding a leaderboard for agents based on the performance they achieve on this and other data sets, so if you do create/train/evaluate an intelligent agent for managing notifications using gym-push, let me know :)

As a fellow lifelong learner I would love to get back any feedback, criticisms, references or tips you may have. If you are interested in this work and would like to learn more about this space, check out my website and feel free to reach out!

--

--