Boxing an AI

Zeynep Evecen
aiforexistence
Published in
5 min readApr 24, 2021

There are many motivations to pursue the dream of superintelligence AI. Some of these are purely out of curiosity and a desire to create something that is not human, while others come from wanting to use AI for the sake of humanity.

Intelligence means the ability to achieve goals. Since superintelligence AI is the product of our genius and more intelligent than humans, we should expect that they will achieve goals more effortlessly than us. This kind of intelligence may pose an existential risk to humanity. Even if most superintelligences are considered human-friendly, we can’t guarantee that unfriendly AI will not appear.

Take the paperclip maximizer example: Imagine an artificial intelligence whose goal is to maximize the production of paper clips. If this artificial intelligence finds a way, it turns the world into a paperclip manufacturing company. It’s because the command that is given by humans: “Maximize the paperclip production” has no limits. It is a common problem in computer science. So, if humans are not on the way to the purpose of artificial intelligence, it may see us as an obstacle.

There are several approaches to this possibility of UAI.

The most common one is to hope that there will be no problem. This may turn out to be true, of course. But it is still uncertain and does not answer the questions of the skeptics.

Another approach is to assume that the behavior of superintelligence agents will be constrained by other agents. But for this to happen, the development of AI must be slow enough to produce multiple intelligent agents. And still, there is no guarantee that multiple superintelligences will be better for humans rather than one.

The one I like the most is to attempt to design a friendly AI. This approach requires intelligence to have the motivations to prevent it to evolve in dangerous directions. For example, the goal of this AI may be to increase humans well-being. And this has to be applied to the first artificial intelligence that achieves superintelligence. In order to do that, we should develop some sort of friendliness theory”.

While talking about friendliness, one solution that comes to mind is to design an Oracle AI (OAI), which can only answer questions and does not take any actions on its own. So it does not pose a risk on its own. Unless, of course, it does not turn on the computer’s fans and try to deliver messages in Morse code.

So, we must ask ourselves, what can be done to reduce this “potential risk” of unfriendly AI?

I want to discuss one of the possible solutions: “boxing” the AI, the methods of boxing, and then I want to add some personal thoughts.

Inside the Box

One of the possible solutions to reduce the risk of unfriendly AI is to restrict the AI’s capabilities. For example, we can keep it in sealed hardware that cannot affect the outside world, we can make it talk with just one person via just one communication channel. We can control its physicality, we can surround it with high explosives, bury the whole set-up under the ground, we can reduce its bandwidth with only three answers. There is really no limit, it is up to your imagination.

But none of these capability control methods makes sense to me, because it is just like keeping a person in prison. These kinds of solutions are not adequate. And they are not actually reducing the risk of its occurrence.

The other solution is motivational control, which is much more convenient.

Rule-based motivational control is based on controlling OAI with clearly defined rules. The main challenge here is to define the rules correctly. Even in the simplest programs, we can get very different outputs from what we intended to code. And we don’t generally realize it before executing.

Yet, it may get easier if we define fundamental rules like space, time, and identity.

For example, one of the motivations of AI can be keeping itself inside the box. For that to happen it should understand its surroundings, and it should have some kind of self-understanding, and it should understand the borders of the box. We may use reinforcement learning to define the borders of the box. With our good and bad feedbacks, we can make it stay in the middle of the box.

Or maybe we can do this with a signal that we broadcast, which would be much safer. If AI receives the signal that we broadcast, it will continue to execute, and if it does not, then it will exit the program. This may make most of the things it will do safer.

Another method is to design a friendly AI from the ground up, which is the work of Eliezer Yudowksy. Such an AI with utility function would naturally not behave in a way that could harm humanity. To design that kind of thing is not even close to easy. As I mentioned earlier we need a “friendliness” theory. Yet, it is the only solution that makes meaningless all the precautions we talked about. If we can create that design we would not have to deal with all these precautions because AI would be safe by its nature.

The picture we have drawn here may sound a bit hopeless. But the reason there are many bad scenarios is because this topic is relatively new and underdeveloped. But that is good news. It means we have more fundamental approaches to develop with less effort.

And if an AI has been developed with security measures from day one, we probably won’t have to worry about most of these bad scenarios. Also, it is still a possibility that UAI will not appear at all.

Besides all the issues we discussed, developing intelligent agents is a great and useful step for humanity, if we can solve this security problem. After all, who doesn’t want Data in the Starfleet?

In this article, I mostly discussed the paper: Thinking inside the box: using and controlling an Oracle AI by Stuart Armstrong, Anders Sandberg and Nick Bostrom. I have not mentioned all the control methods, if you want more detailed information, you can download the paper here.

I hope you find this article useful. For more information about our works, you can visit aiforexistence.com. And you can join the discussion.

Thanks for reading!

--

--