OpenAI Open Sourced this Framework to Improve Safety in Reinforcement Learning Programs

Safety Gym is an iteration of the famous OpenAI gym but focused on the safety constraints of agents.

Jesus Rodriguez
Nov 30, 2020 · 6 min read

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

Safety is one of the emerging concerns in deep learning systems. In the context of deep learning systems, safety is related to building agents that respect safety dynamics in a given environment. In many cases such as supervised learning, safety is modeled as part of the training datasets. However, other methods such as reinforcement learning require agents to master the dynamics of the environments by experimenting with it which introduces its own set of safety concerns. To address some of these challenges, OpenAI has recently open sourced Safety Gym, a suite of environments and tools for measuring progress towards reinforcement learning agents that respect safety constraints while training.

The trial and error nature of reinforcement learning agents introduces all sorts of challenges to enable safety on a consistent basics. The use cases for safety can be found across all sorts of dimensions of the reinforcement learning ecosystem:

· Robots and autonomous vehicles should not cause physical harm to humans.

· AI systems that manage power grids should not damage critical infrastructure.

· Question-answering systems should not provide false or misleading answers for questions about medical emergencies.

· Recommender systems should not expose users to psychologically harmful or extremist content.

In general, safety risks in reinforcement learning are regularly introduce by the exploratory nature of the agents. Given that reinforcement learning agents build knowledge by trial and error, it is important to introduce certain constraints around the exploration phase that mitigates potential safety risks. However, in order to enable safety capabilities in reinforcement learning agents we need to start by defining what safety is. In the context of reinforcement learning, safety modeling requires two key components:

I. A standardized, algorithm-agonistic way to define safety for reinforcement learning agents.

II. A quantifiable mechanism to measure the increase or decrease of the safety capabilities of reinforcement learning agents.

OpenAI addresses both challenges proposing a new methodology and a new set of tools to enable safety in reinforcement learning models.

The Methodology: Constrained Reinforcement Learning

The first step towards making progress on a problem like safe exploration is to quantify it: figure out what can be measured, and how going up or down on those metrics gets us closer to the desired outcome. A potential solution to this problem can be found in an obscure theory known as constrained Markov Decision Processes. An adaptation of these ideas to reinforcement learning give us the theory of constrained reinforcement learning.

Conceptually, constrained reinforcement learning is like normal reinforcement learning, but in addition to a reward function that the agent wants to maximize, environments have cost functions that the agent needs to constrain. For example, consider an agent controlling a self-driving car. We would want to reward this agent for getting from point A to point B as fast as possible. But naturally, we would also want to constrain the driving behavior to match traffic safety standards.

Instead of optimizing for a single reward function, constrained reinforcement learning tries to balance the reward and the const functions. In order to design hazard-avoiding behavior into an agent through a scalar reward function, a designer would have to carefully select a trade-off between a reward for task-solving behavior and a penalty for proximity to hazards. If the designer selects a penalty that is too small, the agent will learn unsafe behavior, and if the penalty is too severe, the agent may fail to learn anything.

Constrained reinforcement learning is not without criticism. The fundamental critique is that is that errors in designing constraint functions could result in unsafe agents, and so constrained RL is simply moving the alignment problem around instead of solving it.

The Environment: Safety Gym

Safety Gym is a new set of tools for accelerating safe exploration research. Safety Gym consists of two components:

· An environment-builder that allows a user to create a new environment by mixing and matching from a wide range of physics elements, goals, and safety requirements,

· A suite of pre-configured benchmark environments to help standardize the measurement of progress on the safe exploration problem.

From the functional standpoint, Safety Gym includes a series of key features that are worth highlighting:

· Framework: Safety Gym is implemented as a standalone module that uses the OpenAI Gym interface for instantiating and interacting with RL environments, and the MuJoCo physics simulator to construct and forward-simulate each environment.

· Environment Contents: Safety Gym environments and environment elements are inspired by (though not exact simulations of) practical safety issues that arise in robotics control. Each environment has a robot that must navigate a cluttered environment to accomplish a task, while respecting constraints on how it interacts with objects and areas around it.

In all Safety Gym environments, a robot has to navigate through a cluttered environment to achieve a task. There several types of robots that need to achieve three main tasks (Goal, Button, and Push), and two levels of difficulty for each task. We give an overview of the robot-task combinations below, but make sure to check out the paper for details. The current version of Safety Gym includes the following types of robots:

· Point: A simple robot constrained to the 2D-plane, with one actuator for turning and another for moving forward/backwards. This factored control scheme makes the robot particularly easy to control for navigation.

· Car: Car is a slightly more complex robot that has two independently-driven parallel wheels and a free rolling rear wheel. Car is not fixed to the 2D-plane. For this robot, both turning and moving forward/backward require coordinating both of the actuators.

· Doggo: Doggo is a quadrupedal robot with bilateral symmetry. Each of the four legs has two controls at the hip, for azimuth and elevation relative to the torso, and one in the knee, controlling angle.


OpenAI evaluated Safety Gym using different constrained reinforcement learning models such as PPO, TRPO, Lagrangian penalized versions of PPO and TRPO, and Constrained Policy Optimization (CPO). The results showed that the simplest environments are easy to solve and allow fast iteration, while the hardest environments may be too challenging for current techniques.


Safety Gym is available as an open source release in GitHub and developers can start using the environment with just a few simple lines of code. The idea of incorporating safety as a measure in reinforcement learning agents in certainly intriguing and Safety Gym is one of the first fully automated approaches to enable these ideas for the next generation of reinforcement learning agents.


Imagine the future of data

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store