Exploring OpenAI Gym: A Platform for Reinforcement Learning Algorithms

Velotio Technologies
Velotio Perspectives
3 min readJan 29, 2019

Introduction

According to the OpenAI Gym GitHub repository “OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.”

Open AI Gym has an environment-agent arrangement. It simply means Gym gives you access to an “agent” which can perform specific actions in an “environment”. In return, it gets the observation and reward as a consequence of performing a particular action in the environment.

There are four values that are returned by the environment for every “step” taken by the agent.

  1. Observation (object): an environment-specific object representing your observation of the environment. For example, board state in a board game etc
  2. Reward (float): the amount of reward/score achieved by the previous action. The scale varies between environments, but the goal is always to increase your total reward/score.
  3. Done (boolean): whether it’s time to reset the environment again. E.g you lost your last life in the game.
  4. Info (dict): diagnostic information useful for debugging. However, official evaluations of your agent are not allowed to use this for learning.

Following are the available Environments in the Gym:

  1. Classic control and toy text
  2. Algorithmic
  3. Atari
  4. 2D and 3D robots

Here you can find a full list of environments.

Cart-Pole Problem

Here we will try to write a solve a classic control problem from Reinforcement Learning literature, “The Cart-pole Problem”.

The Cart-pole problem is defined as follows:
“A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.”

The following code will quickly allow you to see how the problem looks like on your computer.

This is what the output will look like:

Coding the neural network

Conclusion

Though we haven’t used the Reinforcement Learning model in this blog, the normal fully connected neural network gave us a satisfactory accuracy of 60%. We used tflearn, which is a higher level API on top of Tensorflow for speeding-up experimentation. We hope that this blog will give you a head start in using OpenAI Gym.

We are waiting to see exciting implementations using Gym and Reinforcement Learning. Happy Coding!

*****************************************************************

This post was originally published on Velotio Blog.

Velotio Technologies is an outsourced software product development partner for technology startups and enterprises. We specialize in enterprise B2B and SaaS product development with a focus on artificial intelligence and machine learning, DevOps, and test engineering.

Interested in learning more about us? We would love to connect with you on ourWebsite, LinkedIn or Twitter.

*****************************************************************

--

--

Velotio Technologies
Velotio Perspectives

Velotio Technologies is an outsourced software and product development partner for technology startups & enterprises. #Cloud #DevOps #ML #UI #DataEngineering