Superintelligence and superstupidity
Nick Bostrom is an intellectual hero. Pin a medal on him. Since the publication of his book Superintelligence, the topic of superhuman intelligence is in vogue for the first time. Thank God for that. There might not be a more important topic in the world, and it’s gotten almost no attention until now. It still isn’t getting enough.
Bostrom’s focus on superhuman artificial intelligence is an eminently prudent one: what if a superhuman AI’s motives are either hostile or indifferent to human values and we unwittingly hand the universe over to a malevolent god? Even if there is only a tiny chance of this happening, it’s worth thinking about how to avoid such a scenario. One of Bostrom’s insights about existential risk is that if you 1) value future lives as much as present lives, 2) expect the number of future lives over the fullness of time to be 10 quadrillion, and 3) treat reducing existential risk as equivalent to saving lives, then reducing the risk of human extinction “by a mere one millionth of one percentage point” is as important as saving 100 million lives in the present. Unlikely events still warrant precautions if the impact of those events would be the destruction of all human life forever. Even though Bostrom takes some existential risk scenarios far more seriously than I do, I still think he should be applauded for surveying all the potential risks out there.
One of Bostrom’s scenarios wherein a superhuman AI destroys humanity is the Paperclip Maximizer. A superhuman AI is programmed to produce paperclips and cares about nothing else. It its enthusiasm, it turns the entire planet into paperclips, killing all life on Earth. The Paperclip Maximizer is a stand-in for all the scenarios in which a superhuman AI is indifferent to human values. A similar scenario to the Paperclip Maximizer is the Happiness Maximizer. This scenario comes from Bostrom’s colleague and collaborator Stuart Armstrong. A superhuman AI is programmed to ensure that all humans are “safe and happy”. Based on this Prime Directive, it traps all humans “in underground concrete coffins on heroin drips”. Despite the programmers’ efforts, the AI’s motives conflict with our motives. The problem is that the AI’s understanding of the human concepts safety and happiness are antithetical to humans’ own understanding of those concepts.

I think the Happiness Maximizer is an impossible scenario. So is any scenario in the same vein, where the problem comes from the fact that the AI doesn’t understand human concepts. In his book, Bostrom’s definition of superintelligence, or what I’ve been calling superhuman intelligence, is “any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.” Bostrom gives a similar definition in an 1998 paper: “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.” From either of these definitions, it seems clear that a superintelligent AI would be able to breeze through the Turing test and pass with flying colours. A superintelligent AI would be able to talk about concepts like safety and happiness with more than enough sophistication to convince human judges that it is human. By the definition of superintelligence, a superintelligent AI would be able to ponder and discuss the Happiness Maximizer scenario with more depth and insight than Nick Bostrom, Stuart Armstrong, or me.
There is no way a superintelligent AI would make such an elementary mistake as confusing the human desire to be “safe and happy” for a desire to be kept underground in “concrete coffins on heroin drips”. That would make the AI far dumber than the average human with regard to understanding human concepts. Not only would the AI have superintelligence, it would putatively also have subintelligence — or superstupidity. I can’t see how this would be possible. In order to pass the Turing test, an AI would need to demonstrate that it understands concepts like safety and happiness in the same way humans understand them. Either we’re talking about an AI that understands those concepts, or we’re talking about a paradoxical “superintelligent” AI that can’t pass the Turing test — a seeming impossibility.
If you’re unconvinced, imagine the AI is about to begin carrying out its Prime Directive to keep humans “safe and happy”. One way it could decide what to do would be to give itself a version of the Turing test. It could ask itself, “What should a superintelligent AI do to keep humans safe and happy?” and then give itself the same answer it would give to satisfy a human judge. Then, it could do whatever it said in its answer. This way the AI would always be capable of taking actions in accordance with a human understanding of the relevant concepts. It would be impossible for a superintelligent AI to be incapable of doing so.
Simultaneous superintelligence and superstupidity is an absurdity, and so scenarios like the Happiness Maximizer are an impossibility. This is not, however, the only kind of AI doomsday scenario Bostrom and Armstrong have imagined. The overall existential risk posed by superhuman AI certainly can’t be ruled out the basis of the argument I have just given. My argument can only rule out the risk posed by hapless AIs that don’t understand human concepts.