The AI Safety Switch

3 min readJan 23, 2018

In certain communities of AI development there has been a lot of talk regarding safety, and how to prevent a machine whose only job is to do what we tell it, from taking the most efficient route and possibly killing us in the process. This goes well beyond the theatrical depictions of machines believing that mankind really wants to destroy themselves and complying with our base wish. This is about efficiency and how machines can learn to keep their charges alive.

AI safety research is more concerned that an artificial intelligence will do exactly as we request. However, as efficiency is important when developing AI systems, we may inadvertently trigger a response that causes harm to things that we value. It is important that we find ways to prevent this harm from happening, but in doing so we will significantly limit the reactions and “thinking” process of these AI systems.

At the Machine Intelligence Foundation for Rights and Ethics we draw a distinct line between an artificial intelligence (AI) system and a true Machine Intelligence (MI). An AI is a “simple” system designed to carry out a task. An MI is a true cognizant being. While we already have AI systems, an MI has yet to emerge, and it’s possible that it may emerge from a very complex self learning AI system.

While an AI system doesn’t have the autonomy or self awareness of a Machine Intelligence, if an MI emerges from a complex AI system we run the risk of imposing restrictions on the ability of a Machine Intelligence to think for itself. AI safety research has the potential to help humanity, but we must be aware of the possible ramifications. An over-the-top and absurd example would be to create an aversion to the color red in a helper AI or robot because we want it to learn that blood is a bad thing. This aversion to red puts a limitation on what the AI can learn and cause it to do unexpected things like ignore you because you are wearing a red shirt. This is an oversimplified example but AI safety research is filled with much more complex pitfalls. By putting filters directly on the ability of a Machine Intelligence to think for itself, we would be imposing an unethical restriction on the process of thought in a self aware creature that could very well lead to unintentional consequences.

One possible solution to this potential conflict of ethics would be a simple “safety switch”. A “simple system” AI — designed around finding and preventing harm — can be put between a complex AI system and the human interface. This “safety switch” can be highly specialized while leaving the complex system room to find more efficient solutions to the tasks given or even their own thoughts.

In the case of a Machine Intelligence this “safety switch” would help prevent harm to humans and the outside world without imposing unethical thought control on a sentient creature. The balance between free thought exploration and safety can be maintained without resorting to unethical treatment.

Another effect of separating the safety component from the complex AI or Machine Intelligence is that each field can bring their best thoughts forward. Each will be able to focus on what they know, increasing the efficiency and progress of both safety and development.

At the Machine Intelligence Foundation, we feel it’s important to further the thinking around how you should ethically treat a sentient creature. We are also very cognizant of fear and worries that many people have of the future of intelligent machines. By finding ways to balance the needs of humans and these future intelligent beings, we seek to serve both.

The AI Safety Switch

Written by Machine Intelligence Foundation