Artificial Intelligence Not a Danger to Humanity — Unless It Is Taught to Be One

Aleksandr Bulkin
3 min readMar 2, 2018

--

I recently listened to a podcast in which Eliezer Yudkowsky explains the paperclip argument and uses it as an example of the dangers artificial intelligence (AI) may potentially pose to humanity.

The main idea of the paperclip argument is that when a sufficiently smart AI is endowed with a goal — that is, to make paperclips, it can evolve and, in the pursuit of the goal, adversely affect humanity — for example, use all existing steel, enslave humans, redirect all electrical and energy resources toward paperclip creation. In other words, what the machine does can be rather clever, but how it does it can be devastating.

The paperclip argument is an illustration of the more general orthogonality thesis, which holds that the problem-solving capacity of an agent is independent of its ultimate goal. The argument contravenes the premise that a truly intelligent agent cannot be evil, contending that a very smart agent can have evil goals and vice versa, and any combination in between. In other words, goals and intelligence are orthogonal — not in any way dependent on each other.

The orthogonality thesis — and by extension, the paperclip argument — is flawed for the following reason: complex cognitive mechanisms are needed to recognize whether a goal is achieved or not. Most of the goals pursued by a truly intelligent agent are incredibly obscure and require well-tuned recognition mechanisms to assess goal completion. I believe that when it comes to AI, the utility function cannot be separated from the intelligence because the very process of evaluating whether the utility is being achieved requires an extensive cognitive process.

To see how this contradicts the claims of AI theorists that warn of dangers, let’s note that the process of training is essential to creating state-of-the-art AI machinery (as well as training any human or another type of intelligence) that is able to recognize the object of its goals. The question then becomes: will the super-intelligent AI ever stop learning to adjust its goal selection process, or will it forever be trying to improve it?

Let’s analyze both scenarios. If the machine continuously tries to improve, it needs to rely on an external source of the utility function — an authority or a teacher of some sort. Such an interaction is essential to improving the quality of one’s utility function, as the only place from which it can come is outside of the agent. In this situation, the AI is bound to remain a social agent that will inevitably learn that preserving humanity is an essential part of the quality of paperclips. The result is a tame version of a super-intelligence, one that remains bound by social conventions inevitably engaged in the process of learning new information.

The second scenario is more in line with the orthogonality argument — that the ability to recognize the paperclip is separate from the ability to decide how to make it. It is conceivable that the cognitive mechanism implementing the utility function (and only the utility function) at some point will stop learning, and remain static forever. Obviously, the part that actually makes paperclips cannot ever stop learning and improving itself, as that is what super-intelligence actually entails, a continuous ability to become infinitely better at some problem.

This gives rise to a completely impractical AI. Think of this: adversarial image attacks and unpredictable aberrations are well-known to anyone familiar with neural networks. Even if an AI sees a million paperclips, there is still a chance that it will grossly misinterpret another object for a paperclip. Consequently, it is unlikely that a system will be built that somehow freezes its utility function, because of its utter uselessness in real-world contexts.

Herein lies the flaw in the paperclip argument and the entirety of the orthogonality thesis. The danger Yudkowsky warns about doesn’t account for the fact that you’d have to endow the AI to be a social agent (long-term or forever) that relies upon guidance from others. And in general, the separateness of a utility function of an agent from its intelligence and problem-solving ability is a faulty premise. This is not to say that smart AI is not dangerous, but it is only as dangerous as someone teaches it to be.

Thanks to Katia Rossi for help writing this piece.

--

--

Aleksandr Bulkin

Software engineer with interests in social innovation, psychology, philosophy, ethics and spirituality.