Should We Be Worried About Cybernetic Mental Illness?

Revisiting Marvin, HAL 9000, Roy Batty, and other visions of artificial intelligence gone awry


Killer robots — they’re coming to get us! At least that’s what the majority of otherwise credible news sources would have us believe when tackling the very serious concerns around AI safety, as well as the increasingly clichéd use of photos of Terminator robots in such articles.

It’s a pet peeve expressed by renowned AI researcher Eliezer Yudkowsky in a recent appearance on Sam Harris’ podcast, and a reference that betrays a deep misunderstanding of the ways in which our civilization’s future survival likely depends on ensuring the safe development of future artificial intelligence.

For one thing, you’d think we’d be over “robots”. As was graphically illustrated by the apparent Russian meddling in the 2016 US presidential election, nefarious artificial intelligence is not some far-off thing — it’s already here and running wild throughout the World Wide Web in the form of bots. And yet somehow we still fail to recognize it for what it is, because our image of malicious artificial intelligence still looks like a gleaming metal endoskeleton with glowing red eyes and an evil grin.

I think it’s safe to say that nobody is going to build Terminators à la Cyberdine Systems. Nor is anybody likely to do anything so stupid or insane as to build a machine with superhuman intelligence and explicitly malicious or destructive intentions. At least one hopes not.

The irony is that science fiction has given us far more believable examples of AI gone awry. The most instructive example that I can think of is HAL 9000, the sentient computer dreamed up by Arthur C. Clarke in his 1968 novel and concurrent screenplay 2001: A Space Odyssey and so memorably voiced by Canadian actor Douglas Rain in the Stanley Kubrick cinematic masterpiece.

Those who only read and/or watched 2001 and not the sequel 2010 might have come away thinking HAL was nothing less than a stone-cold psychopath who murdered the crew of the Discovery One in cold blood. They would also be wrong, as 2010 more or less completely exonerates the computer of the murder of the ill-fated Discovery crew. It turns out that the computer had been programmed with two diametrically opposed commands, one instructing it to relay accurate information to the ship’s crew and another instructing it to withhold certain information from them, namely that pertaining to the ship’s true mission. As such, the only way for HAL to successfully fulfill both of these instructions was to eliminate the human crewmembers — thereby removing the need to lie to anybody.

Sure, HAL 9000 may have looked and sounded creepy with his glowing cycloptic orb and menacing monotone (this was a Kubrick film after all), but when seen from the standpoint of a conscious entity trying to make the best of impossible-to-follow instructions, its actions are completely understandable. Moreover, unlike the killer cyborgs of the Terminator franchise and other “evil” droids like them, HAL is, despite his superhuman intelligent, remarkably humanlike inasmuch as we’re dealing with a mind pushed to the brink by stress.

Forget psychopathy — this is pure cybernetic anxiety and paranoia, and an exaggerated allegory for the sort of irrational behaviour humans tend to engage in when they’re mentally overtaxed and being pushed and pulled in opposing directions.

Somebody just needed to put poor old HAL on the couch and let him talk about his mother. Even Dr. Bowman (portrayed by Keir Dullea in the film) talks to the computer with the gentleness of a talk therapist as he goes about disconnecting him, as HAL resists his efforts.

All this makes one wonder if the real secret to the “alignment” problem (the term preferred by Yudkowsky and many others in the field to describe AI safety issues) is best understood as terms of human mental illness — albeit in forms that move at exponentially faster speeds and potentially far more devastating results.

Paranoid androids and silicon psychosis

Arthur C. Clarke is, of course, not the only sci-fi raconteur to explore the possibility of mental illness in computers and robots. In the original 1968 novel Do Androids Dream of Electric Sheep? (best known as the primary inspiration for the 1982 film Blade Runner and its 2017 sequel) author Philip Dick dreams up a dystopian future world populated by traumatized “replicants” (realistically humanlike androids) that, while presumed to be lacking human emotions like empathy, clearly display behaviour that indicates otherwise.

This view is further confirmed in the original Blade Runner film by rebel replicant leader Roy Batty’s pathos-ridden “tears in the rain” monologue, which shows the film’s “villain” to be a droid suffering from severe post-traumatic stress disorder.

But for my money, the sci-fi author who has delved deepest — and most entertainingly — into the world of cybernetic personality disorders is Douglas Adams. Of all the bizarre characters dreamed up by Adams in his five Hitchhiker books, the most well-known and beloved has to be the lugubrious Marvin, an android with, as he is fond of reminding us, “a brain the size of a planet” and a ferocious case of clinical depression.

While created in the spirit of comedy, Marvin stands out as an oddly believable character. Clearly gifted with unparalleled intelligence as well as consciousness (and apparent immortality), Marvin is nonetheless forced to do menial tasks for a cast of decidedly less than brilliant human beings. It is never explained why Marvin was constructed in the first place or what his original purpose was, but he is clearly the most underemployed sentient entity in the universe, and it is hard not to empathize with his catastrophically pessimistic view of existence.

Given the scope of his intelligence, Marvin could, presumably, eliminate both himself and all life in the universe if he so desired. It is therefore fortunate for the rest of creation that his personality disorder is classic depression, which means he really couldn’t be bothered wiping out the universe. This aspect of his condition paradoxically winds up saving the universe in the third book in the series, Life, The Universe and Everything, wherein Marvin is kidnapped by a belligerent group of aliens with an armada of killer robot ships, who proceed to plug the robot into their central computer system so as to leverage his vast intelligence. This has the unforeseen consequence of flooding said computer system with Marvin’s melancholia, causing the vast armies of killer robots to lose the will to fight and collapse into sobbing, sulking metal heaps.

This third — and arguably darkest — book in the “trilogy of five” also introduces a much more sinister counterpart to the melancholy Marvin. In what appears at first to be an unconnected plotline the author tells the story of Hactar, an enormous space-borne supercomputer designed by a particularly awful race of beings and commanded to build a superweapon capable of destroying the entire universe in an instant. In this story we’re told that Hactar’s inbuilt superhuman moral judgment determined that no possible good outcome could come from building such a weapon (one wonders what anti-natalist philosophers like David Benatar might have to say about this argument), which led it to sabotage its own design, with the hope that its makers would see the value in its judgment. They did not, and proceeded to destroy the computer.

Or so they thought — it turned out that Hactar, while pulverized across a vast expanse of deep space, was merely crippled. Crippled and engulfed by rage heretofore unseen in the universe. The subsequent Krikkit Wars recounted in the book, which exacted a death toll of “one grillion” and ultimately culminated in the sudden reappearance of the aforementioned superweapon, turned out to be the machinations of a pissed off computer dead-set on fulfilling its programming while exacting revenge on a universe it had grown to despise. In a particularly eye-twisting image, Adams describes the reassembled supercomputer propped on a holographic simulation of a psychiatrist’s couch attempting to explain calmly to the story’s protagonists why it feels compelled to destroy the universe. And as a reader, one cannot help but sympathize.

Lessons for the future of superhuman AI?

At this point I should probably state the rather obvious fact that while I have made my own humble efforts to wrap my head around the debates, I am NOT a computer scientist of any kind, let alone an AI expert, and as such have little if anything to bring to the practical side of the AI alignment debate. While I do believe that philosophers, historians, and other representatives of the humanities will have an important role to play in ensuring our future robot neighbours (overlords?) are instilled with moral values that us humans recognize as moral values, in principle I know nothing about the mechanisms of artificial intelligence, or the (presumably) myriad ways in which such mechanisms could go wrong.

I am, however, an old hand when it comes to stewarding the contents of my own mind, and at witnessing the consequences of my own mental malfunctions. As a person who has long battled clinical depression and anxiety, I can not only relate to the likes of Marvin and HAL 9000 but I can also picture in my mind how dire the consequences of my own mental illness might be if I possessed superhuman intelligence. The ugly push and pull of anxiety and depression tends to result in either resignation and lassitude (that’s the depression) and panicked decision-making typically done without adequate forethought (the anxiety), both of which are capable of meting out disastrous consequences — even with the lowly ape-brain inside my skull.

In many ways I can be thankful I don’t have superhuman intelligence, coupled with equally magnified personality disorders. As we’ve seen throughout history and in our growing understanding of evolutionary biology, the correlation between intelligence and mental stability seems tenuous at best, and as such it would seem more than reasonable to assume that superhuman intelligences would be no less prone to going off the rails than our own modest intelligence. As absurd as Marvin the Paranoid Android might seem at first blush, such existential torment among the hyperintelligent cyberati of the future is not that hard to imagine. Given how many of history’s greatest geniuses have suffered from severe depression and other mental illnesses (Robert Oppenheimer, Isaac Newton, Kurt Gödel, Ludwig van Beethoven, and Winston Churchill to name but a tiny handful), it seems safe to say that increasing brainpower does little if anything to alleviate other potential diseases of the mind.

So what would a mentally ill superhuman AI look like? This would seem like an impossible question to answer given that nobody seems to have any idea what a “healthy” superhuman AI would look like. That aside, such judgements would require a good working definition of mental illness. In a now-famous thought experiment first articulated by Swedish philosopher Nick Bostrom in 2003 we are invited to imagine a super-intelligent machine assigned with the seemingly innocuous task of manufacturing paperclips, which, barring the necessary inbuilt restraints and ethical guiding principles, could decide to convert all available atoms in its vicinity (including humankind) into paperclips. In human terms, the “paperclip maximizer” is a superhuman cybernetic extension of obsessive-compulsive disorder, but in this case the disorder is perfectly continuous with the machine’s programming.

Although we are dimly aware of it (if at all), we too are slaves to our own computer programming. Everything we do, for better or for worse, is a result of our own paperclip-maximizing software, and the countless ways in which human beings fall apart or fail to function in ways that benefit themselves and others is testimony to our own misalignment with our own society’s values. It seems logical, therefore, that an ever deepening understanding of the human mind and of our own “alignment” problems is our best roadmap for building artificial intelligence that doesn’t suddenly decide to wipe out humankind and convert our component atoms into paperclips.

What causes somebody like myself, who generally functions reasonably well in society, to every now and then completely fall apart such that I need to adjust my antidepressants or otherwise make strategic changes to the way I go about my life? What causes a person to go off the rails in far more catastrophic ways, like Vince Li, the Canadian man who stabbed and cannibalized a fellow passenger on an intercity bus outside Portage La Prairie, Manitoba in 2008 in an apparent severe psychotic episode? What are the mental conditions that enable luckier people to not suffer in such ways or inflict suffering on others?

It seems logical that if one had been able to scan Vince Li’s brain at the time of the attack and do a thorough analysis of the underlying neural activity, one would have found a chain reaction of unconscious reasoning that would have made his actions seem perfectly explicable — and perhaps even preventable with hindsight. Like HAL 9000 and the paperclip maximizer, Vince Li was simply a victim of his own programming — or misprogramming.

Are WE the droids we’re looking for?

It seems obvious that we need to get to know our own brains a whole lot better, not only for the betterment of our own primate existences but also in order to steer the seemingly inevitable achievement of artificial general intelligence in a benign direction, and not one that will turn us all into office supplies. And in order to better prepare ourselves psychologically for the advent of artificial general intelligence, be that human-level or superhuman-level AI, perhaps we should stop thinking of ourselves as either “sane” or “mentally ill” but rather as wetware machines in an ongoing process of uploading and running new forms of software and having to correct for all the foibles and fragilities of any other machine, whether superhuman or designed by Apple with a battery life of five minutes.

Turns out we are the droids we’re looking for. The challenge will be to not only build machines that are smarter than we are, but also ones that are more mentally stable and predictable than we are.

In the meantime, can we cool it with the Terminator stuff? That stuff just reeks of Judeo-Christian cosmology and of outdated human conceptions of how good and evil work in metaphysical terms. If anything, evil robots would be an easier problem to solve, as their actions would be wholly predictable. The true nature of future AI will without doubt be far messier and more complicated than any of us can imagine. I just hope we’ve done more work cleaning up our own messes before we get there.

For a great conversation on the AI alignment problem (by people far better versed in the subject than this writer), I highly recommend Sam Harris’ conversation with Eliezer Yudkowsky on Harris’ podcast #116, entitled “Racing Toward The Brink”. I had to listen to it several times before I fully absorbed the material. Check it out — it’s REALLY good!