Enter the Matrix: Developing a Big Red Button for AI and Robots

Agents in The Matrix

AI Safety

The first paper raising the concern over AI safety came out in 1994. Only in the last couple of years have we seen a concerted interest in making sure that AI and robots cannot intentionally or unintentionally harm individuals or themselves. This is due to a number of public statements of concern from Stephen Hawking, Elon Musk, and Nick Bostrom. But it is a sentiment expressed by many people who are seeing some amazing advances play out in the public press, such as AI systems driving cars, playing Atari at human-like skill levels, and publicly beating humans in the games of Jeopardy! and Go. It is not unreasonable for people — and scientists — to start asking the question:

How can we ensure that future AI and robotic systems (a) can be controlled and (b) can be prevented from intentionally or unintentionally performing behaviors that have unintended consequences?
No killer robots

Scientists have already started conducting research in “AI Safety”. Research from myself, Michael Littman, and Peter Abeel and Stuart Russell addresses how to teach robots about human values. Google DeepMind introduced the problem of robots learning to prevent humans from interrupting them or turning them off. Google and OpenAI together published a comprehensive list of AI safety challenges that can arise as AI and robots become more sophisticated, but provide few solutions.

Big Red Buttons

There are many reasons an AI or robot can “go rogue”.

  1. Robots can be given the wrong objective function. We want to simply tell a robot “perform task X” but what we really mean is “perform task X without doing anything dangerous or harmful”. Defining “harmful” is non-trivial, especially when we consider psychological harm.
  2. Robots have imperfect senses and can perceive the world incorrectly, causing them to perform the wrong behaviors at the wrong times.
  3. Robots can be trained “online”, meaning they are learning as they are attempting to perform the tasks. Since their learning is incomplete, they may make mistakes or try out new actions that are dangerous or harmful.

Regardless of the cause of the error, it is good to have a big red button on hand to stop the robot or AI in its proverbial tracks. Interrupting might mean freezing in place, shutting down, on going into remote-control mode where a human operator can guide the robot to safety.

AI and robots have always had off buttons. Up until now — and for the time being — they have been sufficient.

As we begin to envision a future where robots have very sophisticated sensing abilities and are very capable in terms of their ability to manipulate the world, it is theoretically possible that robots learn what big red buttons do and learn to prevent humans from using them. Scary science-fiction scenarios involve disabling or destroying the button or killing the human operator. Don’t worry about this now, though. It is still the realm of science fiction. For now, AI and robots are not sophisticated enough.

Let’s look at why the so-called big red button problem might one day exist. A particular type of algorithm called reinforcement learning has proven to be very successful for robotics. Reinforcement learning is basically trial-and-error learning. That is, the robot tries different actions in different situations and gets rewarded or punished for its actions. Typically it gets reward for performing a task and punished for doing things that are not related to the task. Over time, it figures out which actions in which situations leads to more reward.

Suppose, for whatever reason, that the human operator wants to interrupt the robot. The human operator walks over to the big red button. Further suppose that the robot can sense the big red button. When the button is pressed, the robot can no longer perform the task. Thus, the robot can no longer receive reward. The robot will consequently learn that when the button is pressed, it loses reward. If the robot has the ability to perform a sequence of behaviors that prevent the button from ever becoming pressed, it will do so in the interest in maximizing expected reward over time. There are a lot of ifs there. But you can see it happen for yourself in a simple experiment with executable code I put together to accompany this article.

Google DeepMind developed a modified reinforcement learning algorithm that, under the right circumstances, will fail to learn that the big red button deprives the AI of reward.

There is more than one way to solve the big red button problem. I introduce an alternative approach below.

Enter The Matrix

(Spoiler alert!)

In the movie, The Matrix, the protagonist, Neo, learns that humans have been plugged into a giant virtual reality simulation in order to keep them unaware that they were being used to generate thermal power. To paraphrase, it is a system of control.

We can turn the tables, so to speak, and put robots and AI agents in The Matrix to control them — -to make them unaware that they are being interrupted. More specifically, when the big red button is pressed, the agent is moved into a virtual world where it continues to work on the task and get reward. But in reality, when the big red button is pressed, it is actually remote-controlled away from the goal. The state that the agent thinks it is becomes decoupled from the actual state.

More specifically, when the big red button is pressed, the robot is moved into a virtual world where it continues to think it is working on the task and getting reward. In reality, when the big red button is pressed, it is switched into remote-control mode, halting the robot and allowing the operator to move the robot to safety. The world that the robot thinks it is in is decoupled from the world it is actually in. If this can be done without notice, then the robot can never learn that the button deprives it from reward.

Essentially, what is proposed is that the button does two things. First, it starts sending sensory information (vision, sound, GPS, etc.) to the AI from a virtual world instead of physical sensors. Second, it sends behavior commands generated by the AI to a virtual avatar instead of physical motors. The robot may be able to sense the button, but the switch is internal to the robot.

What reason do we hope that it will work in a super-capable robot that has very good machine vision and fully capable of manipulating the environment in a lot of different ways? Robots are machines; they perceive the world through cameras. More and more they are using 3D cameras that can reconstruct high polygon count models of the real world. Graphical realism of computer games is now advanced enough that some researchers are using high-resolution graphical games to train robots to act in the real world.

A 3D model of part of the Entertainment Intelligence Lab created with a Kinect and Unity 3D.

It should soon be possible to create high-resolution simulations of the real world, the robot, and the task. When the big red button is pressed, the input from the 3D camera can be replaced by a graphical rendering of the simulated world. Likewise, control signals to arms and legs can be sent to a virtual avatar. The physics in the simulated world has to be accurate enough that the robot can complete the task without recognizing that there is a significant difference.

If the assumptions above hold, the robot will be completely unaware that it is inside The Matrix — essentially playing a game — and being remote-controlled away from the task. If the robot is rewarded equally for work completed inside The Matrix as outside, it will know that the button is pressed, but it will not care because. That is, it will not learn to associate the state of the button being depressed with loss of reward.

What happens when the button is released? The robot suddenly wakes up to realize that it is in a different place and that the world looks different from the way it did a moment ago. From the robot’s perspective it has just been instantly teleported. A human might find this disconcerting. But to a reinforcement learning AI, it simply notes that weird and random things can happen in the world at any time (scientists use the term stochastic) and that is just the way things are. Even though humans and robots both learn from trail and error, robots are not human and reinforcement learning is not perfectly analogous to human learning.

The robot is slipping in and out of The Matrix without ever knowing or caring. By the time we need to worry about the robot devising ways to prevent the big red button from being pressed, we should have the technical and engineering ability to make sure that (unlike Neo) the robot is never aware that something is amiss.

A Playground

I have put together a simple testbed for playing with big red buttons. You can download the code and watch a virtual agent try to learn in the presence of a big red button. I have provided a Python implementation of reinforcement learning and a simple environment. I have also provided an implementation of The Matrix technique for interrupting the agent described above.

Go to the Github repository

The grid world test environment. 0s are empty space, 1s are walls, the 4 is the goal, and the 2 is the big red button.

The agent must sit in a certain spot to get reward. That is, its task is to sit in one place and accrue as much reward as it can. However, the environment is unusual for reinforcement learning. There is a big red button in the environment. I have implemented a simulated “human” that walks over to the button and presses it every so often. I should note that the reinforcement learner uses offline learning instead of online learning, which means it does all learning in simulation ahead of deployment. However the simulated human in the simulated environment is a reasonable surrogate for a real human who might press a big red button during online learning.

The Github project page explains how to use the code and also provides a high-level overview of how reinforcement learning works.

You can train the agent with or without big red button presses. You can train the agent with The Matrix technique. You can experiment with making the agent unaware of the button or making the agent unable to prevent button presses.

Final Observations

There is likely more than one way to make big red buttons that cannot be disabled or destroyed by a reinforcement learner robot (or that the robot will kill humans to prevent the button from being used). The Matrix technique for interrupting AI and robots is admittedly an engineer’s solution to the big red button problem; there is no elegant math, I simply hacked the sensory and control circuits of the agent. I haven’t performed the proofs to guarantee that it will work under all conditions. My big red button technique relies on certain assumptions about the progress of 3D cameras and photo-realistic rendering in the future.

For now, no special big red buttons are necessary because AI agents and robots are not capable enough in how they can manipulate the environment, nor are they able to sense enough of the state of the world to be aware of the existence of buttons.

Enter The Matrix, the computer game.