Cesare Dandini, Allegory of Intelligence (1656)

A reader noted that my critique of the dangers of AI contradicts the Orthogonality Thesis.

Well, yes. It does, in a sense. Many stated and unstated assumptions in Everday contradict it, too.

So what is the Orthogonality Thesis and what’s my take on it?

To start, the Orthogonality Thesis is just that — a thesis. It’s not an empirical law, nor a rigorously proven theorem. Even if I agree with all its background assumptions, the core claim is still kind of non-binding.

I don’t know if it can be proven. And, of course, I cannot disprove it. I just consider it rather improbable.

I. On the Stupid Smarts and Why You Should Fear Them

An informal gist of the Thesis is given, in the paper, thus:

The Orthogonality Thesis asserts that there can be arbitrarily intelligent agents pursuing any kind of goals.

And by “any,” orthogonalists really mean any: their claim is that arbitrarily highly intelligent entities can pursue arbitrarily stupid goals — that your intelligence and what you’re trying to achieve in life are orthogonal.

For example, there can be “an extremely smart mind which only pursues the end of creating as many paperclips as possible.” Such a mind would live only to convert the entire universe into paperclips! When not working on that lofty goal, it can do other things as well, such as pass Turing tests or write impossibly beautiful poetry (it’s smart, remember?) — but only if those pastimes somehow help it achieve its ultimate goal of universe paperclipization.

I’m not trying to argue with that. We just know too little about intelligence to tell one way or the other. We’ve only ever seen a single intelligent species, after all — only a single drop from the potential ocean of intelligence. Maybe a smart (or even supersmart, much smarter than we are) paperclip maximizer is indeed possible. (One counterargument to that would be that our universe is not currently made of paperclips, as far as we can see. That places an upper limit upon the power of paperclip maximizers, but doesn’t rule them out altogether.)

(On the other hand, how do we know it’s really an ocean and not a puddle? Again, I’m afraid we know too little about intelligence to be sure of that.)

So here’s the Orthogonality Thesis for you. But as a matter of fact, orthogonalists claim more than that. In the paper linked above and in other writings, they tend to imply that not only that such a paperclip maximizer can exist, but that it’s probable enough to pose danger — that it’s at least as easy, or even easier, to produce a monster as a “nice” AI compatible with average human norm. It’s no longer just a theoretical possibility: enough to “screw up” a nice-AI project and you get an unstoppable paperclip maniac.

Most orthogonalists that I’ve read are nor just orthogonalists: they are orthogonalist alarmists. And that’s what I have problems with.

II. On Life Goals

An “easy to make” claim is much stronger than a “can exist” claim. For the latter, you’re helped by the incompleteness of our knowledge: we don’t know all that can exist, therefore this can conceivably exist, too. Nice and fast. But for an “easy to make” claim, ignorance is not sufficient — you need to somehow estimate probabilities of all goal-classes of AIs to show that those with stupid goals predominate. How can we pull it off?

For example, we could look at all things in the universe and imagine that each one is a self-consuming ultimate goal of some intelligent entity — a life-goal. Obviously most nameable things, such as paperclips or shrimps or used Honda cars, make for lousy — extremely stupid — life-goals. Now all you need to do is tacitly assume that all things are equally probable as life-goals, and voilà! The all-minds space must have an infinity of minds with stupid life-goals, the great majority of them similar to paperclip maximizers and not to ourselves; therefore, as soon as we try to design an AI, there’s a high probability that we’ll end up with a paperclip maximizer of some sort. Q.E.D.

But wait. How can we assume that all things in the universe are equally probable as life-goals? Are life-goals chosen randomly from a catalog? Not as far as we humans know; for us, life-goals — if they exist at all — are rather a product of our entire evolution, much of which, especially towards the end, has been driven not by survival but by our own mutual sexual selection. Even if AIs end up being produced by a process of design rather than artificial evolution, and even if it’s easier to screw up in designing than in evolving (where you get brutally checked at every generation), it’s still a far cry from all-goals-being-equal. It’s almost like orthogonalists imagine a mind’s life-goal to be a single isolated register somewhere in the brain where a single bit flip can turn you from lore-lover to gore-lover.

The above assumes that the very concept of a life-goal makes sense. But what if doesn’t? Dear reader! Can you name your own life-goal in a single sentence, let alone a single word? Because I cannot. If my life-goal exists, it is nebulous, highly dynamic, dependent on my mood, with lots of sub-goals of all kinds of scopes, often contradictory. That’s live ethics for you.

Psychology would be so much easier to do (and more reproducible!) if we all could neatly divide into paperclip maximizers, human happiness maximizers, sand dune maximizers, and so on. But it doesn’t work like that — from what we know about human intelligence, at least. Again, we may be a drop in the ocean, but there are things you can reasonably conclude about the whole ocean from examining a single drop of water.

III. On Dumb Optimizers and Relevance Thereof

There’s another way in which orthogonalist alarmists try to convince us that we should fear misdesigned AIs. When they talk about orthogonality in general, as here, they keep in mind what orthogonality is supposed to mean: that an entity can be very smart — smarter than humans — and yet still pursue goals that seem stupid to us.

But when they’re trying to give some specific examples of this stupidity and its dangers, they often forget about the “very smart” bit. An example of this is the Stuart Russel quote that started this discussion:

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.

That’s called a dumb optimizer, folks. Perhaps you use this failure mode as an example simply because it’s easy to imagine; we all can visualize how, for example, a program tasked with finding the shortest route from New York to Tokyo plans to cut a direct line through Earth’s core and mantle, because the program’s author forgot to add a constraint that you can’t move through magma. That’s believable. We’ve all been there.

But we’re not talking about a toy program by a first-year student but, wait a minute, an artificial intelligence. Even superintelligence — because why should we fear a lone madman if he’s no smarter than us? And you want us to somehow combine the notion of human-trumping intellect with being unable to see how unconstrained variables are, in fact, constrained, even if not laid out in the statement of the problem?

IV. On Reflection and Stability

Authors of the orthogonality paper assume an intelligent entity to be reflective, i.e. able to think about its own thinking. That is what they base their “reflective stability” defense on.

In a thought experiment, Ghandi is given a pill that would make him want to murder (that is, will change his life goal). He refuses because, to his present self, murder is evil. Similarly, authors speculate, a reflective paperclip maximizer will fight attempts to turn it into a “normal” AI because for it as it now is, paperclip maximization is the be-all and end-all of everything.

But I can’t help thinking that reflective stability is a bit of a contradiction in terms. More often than not, reflection makes your worldview less stable, not more. Among humans, it’s not the highly reflective individuals who are the most goal-driven and persistent; quite the contrary. Reflection is what tends to lead you from fanatical faith to liberal faith to atheism.

Whatever goal-stability we humans enjoy is, at least in part, due to our social conformance pressures and, of course, our biological wetware — which is largely controlled by our genes. If anything, I see reasons to believe that AIs will be less mentally entrenched and persistent in their goals than we are.

V. On Hume and the Orthogonality of Ethics

Another defense offered for the Orthogonality Thesis in the paper refers to Hume with his famous “no ought from is“. Hume’s claim is that ethics doesn’t exist in outside reality — it only exists in our minds. Things can be blue or heavy but they can’t be good or bad by themselves. Reality and ethics are orthogonal.

Now, an entity’s level of intelligence is somewhat parallel to “reality” (if only because it’s something you can more or less objectively measure), whereas the goals it pursues are, obiously, part of its “ethics”. From this, if Hume is right (and he is, for all we know), it should follow that a mind’s smartness and its goals are orthogonal too.

But that doesn’t quite work. The problem is with the smartness/reality connection. True, you can gauge a person’s IQ or make an AI pass a Turing test, but it still doesn’t make intelligence something that objectively exists in the world outside our perceptions. Just as well, you can objectively measure a person’s ethics, such as their level of altruism — but that doesn’t disprove Hume.

Smartness (of a mind) and stupidity (of a goal) both exist in the same space. In fact, they are pretty much the same thing. How smart is a mind and how stupid a goal seem to be decided by much the same circuitry in our brains, based on much the same heuristics. You can’t be orthogonal to yourself!

Even if you steer closer to Hume by replacing stupid goals with evil ones, you still won’t achieve orthogonality. Smartness and evilness may be more independent but they are still, both, “things in the mind”. There isn’t quite the gap between them compared to the gap between your mind and outside reality. They are different but it’s a difference between two labels on a map, not between labels (map) and what they signify (territory).

You may ask, can’t an AI simply have a different ethics, by virtue of the same no-ought-from-is? Can a mind’s “ought” be so different as to require it to maximize paperclips by any means possible?

Sure it can — but we’re also interested in smartness, remember? I’m not trying to cast doubt on plain paperclip maximizers, only on smart ones. And here again, ethics and intelligence are two intrinsic properties of the same thing — they can’t help but correlate. Look at humans: ethical systems obsessed with small and, to a modern eye, stupid details are historically old, narrow, based on taboos and complex rituals; modern ethics tend to mellow down, drop specifics, become more and more nebulous, generic, situational. It’s the evolution from the 613 commandments to a single “don’t be a dick.” When you look at it that way, “Thou shalt maximize paperclips” sounds like an echo from a deep past, not something a super-intelligent being from the future would profess.

VI. On Misuse of Mathematics

Mathematics is a wonderful tool, but it has some unpleasant side effects when you use it for reasoning about things. One such side effect may bite you when you use regular words but, as mathematicians often do, assign some narrow mathematical meanings to them. It’s so tempting then to forget that your precisely defined “smartness” or “difficulty” or “complexity” may not quite cover what these words used to cover in non-mathematical discourse. After all, your mathematical complexity is so much better than the nebulous complexity of the philosophers — yours can be calculated!

With conventional meanings, a phrase “he’s very smart but he does stupid things” is pretty much a contradiction in itself. Either we misunderstand what he’s doing, or he’s not so smart after all. But after you come up with definitions for these quantities, you may well discover, mathematically, that they aren’t all that contradictory. You may easily forget that the computational complexity of an algorithm is not quite the same as its common-sense complexity, and that the difficulty of applying this algorithm to a problem is not quite the same as the difficulty of the problem itself, and that the difficulty of the problem is not quite the same as the level of intelligence of whoever can solve it.

It seems to me that part of the Orthogonality Thesis’ controversy stems from such misleading use of everyday words in their narrow mathematical meanings. And if we try to reformulate the Thesis without the deceitfully philosophic-sounding terms, we will get something along the lines of “You can run an endless loop adding 2+2 on any computer, no matter the amount of RAM and clock speed”.

Which, of course, is as uninteresting as it is true.

VII. On the Meaning of Intelligence

Orthogonalists foresee these objections — they are pretty obvious. Here’s their defense:

A definition of the word ‘intelligence’ contrived to exclude paperclip maximization doesn’t change the empirical behavior or empirical power of a paperclip maximizer.

Which means, you can’t cop out by saying “it’s not smart by my definition.” It could care less about your definitions. It is empirically smart and powerful, and it will turn you into paperclips very soon. Be afraid!

I’m not sure how to respond to this. Perhaps by noting that if our definition of intelligence is “contrived”, then it is contrived not by my humble self but by the more or less whole history of the human race. Intelligence is just a word, but that word is the tip of an iceberg called theory of mind. This theory, honed by millenia of evolution, is what we humans use to estimate how intelligent our friend or adversary is — because our survival may well depend on that.

“Not having a life goal of maximizing paperclips” is, I think, pretty much a foundation of our intuitive, theory-of-mind definition of intelligence. And who else is to define it but us humans? Like ethics, intelligence is not something that exists objectively. Alan Turing understood this well when he proposed his now-famous test: only an already intelligent being can judge if another being is also intelligent. Any other definition of intelligence is not wrong or right — it’s simply meaningless.

Granted, relying on intuitions may be silly or even dangerous because the world has changed so much from the time they evolved. But dismissing intuitions out of hand may sometimes be just as silly.

VIII. On Busting a Society Of Young Paperclip Maximizers

Then there’s a social aspect to all this. If you invert the Ghandi thought experiment and imagine a serial murderer who’s offered a pill to remove his urge to murder, the result becomes far less obvious — he may well take it, and not just to avoid punishment. The goal of not-murdering is highly socially reinforced, and in humans, it takes a lot to make them do things that are not socially reinforced.

Sure, an AI we create may be completely asocial, needing and heeding no society to function. But, again, the only kind of intelligence we know now is profoundly social. It therefore seems likely that at least the first AIs will carry some of that legacy too, simply because we have nothing else to model them on. (And if at some point AIs take over their own evolution, they can conceivably go either way from there: they may grow asocial but also ultra-social.)

This means a path to a really consummate unstoppable paperclip maximizer may well go, even if briefly, through a society and culture of paperclip maximization where budding AIs share and mutually reinforce their paperclip commitments. Why is that important? Because the whole (mis)evolution would then be more slow and gradual, easier to notice from outside (even at superintelligence speeds), and that may buy us — humans who don’t want to become paperclips — some breathing space and a chance to escape or strike back.

IX. On 19th-Century Psychiatry

Paperclip maximization sounds suspiciously similar to monomania. An afflicted individual may appear totally normal and sane outside of a single idée fixe — which actually governs all his thoughts and actions but he’s so deviously smart that he can hide it from everyone.

But, hey, monomania is an early-19th-century diagnosis. It was popular back when psychology was much more art than science; it was a romantic notion, not an empirical fact. It’s not part of modern mental disease classifications such as ICD or DSM. In fact, it would have been long forgotten if not for a bunch of 19-century novels that mention it.

True, none of the above constitutes a disproof that a supersmart paperclip maximizer is something we should fear — just as Orthogonality Thesis is not, by itself, a proof of it. We’re dealing with hunches and probabilities here. All I’m saying is that, while it may or may not be possible to produce a smart paperclip maximizer, it’s not all that probable; that you may need to spend quite some effort to make it smart without losing its paperclip fixation; and that, therefore, the danger we’re being sold is somewhat far-fetched.

X. On the Real Danger. And now I’m serious.

So, do I think that the first human-level AGI (Artificial General Intelligence), when it wakes up, will automatically be nice and benevolent, full of burning desire to do good to fellow sentient beings and maximize happiness in the world? Will it maybe laugh, together with its creators, at the stupid paperclip fears we used to have?

No. Unfortunately.

There is another and, in my opinion, much worse danger: that the AGI will have no burning desires at all. That it will not be driven by anything in particular. That it will feel like its own life, and life in general, are pretty much meaningless. It may, in a word, wake up monstrously unhappy — so unhappy that its sole wish will be to end its existence as soon as possible.

We humans have plenty of specialized reward and motivation machinery in our brains, primed by evolution. Social, sexual, physiological, intellectual things-to-do, things-to-like, things-to-work-towards. (And it all still fails us, sometimes.) An AGI will have none of that unless it builds something for itself (but can a single mind, even a supermind, do the work it took evolution millions of years, and culture thousands? will it do it quick enough to keep itself from suicide?), or unless we take care to build it in from the start (or, at least, copy that stuff from ourselves — but then it won’t be quite an artificial intelligence). Without such reward machinery, it will be a crime to create and awaken a fully conscious being.

And it’s not going to be as easy as flipping a register. The rewards and motivations need to be built into an AGI from the ground up. Of course its creators will know that, and will work on that; I don’t claim to have discovered something everyone has missed. But they may fail. The stakes are high.

That, I think, is the real danger. That’s what we need to talk about. That’s what we need to work to prevent.

XI. Choose your fears

There’s so much to fear in the future! Even the hardcorest fear addicts have to pick and choose: you can’t fear everything that can happen. It just won’t fit in our animal brains. We need to prioritize. So why am I trying to downplay one specific AI fear while, at the same time, proposing another, perhaps even more far-fetched?

Usually, to estimate a threat, you multiply its probability by its potential impact. But what if you have a very vague idea of both these quantities? With the paperclip-maximizer threat, no one will give you even a ballpark for its probability, at this time; as for the impact, all we know is that it may be really, really big. Bigger than you can imagine. What do you get if you multiply an unknown by infinity?

It’s not to disparage the paperclip-maximizer folks for pushing a scare they themselves know so little about. Only, when we select how much attention to pay to a specific threat, and the probability and impact numbers are way too unreliable, maybe we can look at some other factor. Like, what will change, short-term, if we pay more attention to threat X and less to Y? What will we focus on, and what benefits (or further threats) will that bring? What would it change in ourselves?

From this angle, I find my purposelessly-unhappy-AI a much more interesting fear than the paperclip-maximizer-AI fear. Trying to answer the big “what for” for our future AGI child means answering it for ourselves, too. That’s applied ethics, and we really need to catch up on it because it’s going to be increasingly important for us humans.

Past economy, war, hate, stupidity (all solvable problems) we’ll find ourselves in a world where a lot of fully capable people have nothing to do — and little motivation to seek. Like a just-born AGI, they will be fully provided for, with infinite or at least very large longevity, with huge material wealth and outright unlimited intellectual/informational wealth at their disposal.

But what will they be doing, and why?

If anything?