The True Evolutionary Analog to AI
I’m in the process of reading the book Superintelligence by Nick Bostrom, and this will be the first of several posts that are sparked from that read. My first feelings are of being somewhat underwhelmed and frustrated because the arguments are 1) almost entirely philosophical and 2) ones we’re already heard many times over again. The 2nd point might be unfair, since so much of the popular conversation was ignited from the book to begin with. As an example, I did read the Wait but why post on the subject. Thus, the core arguments are more than a little familiar.
Chapters on motivation have broken the mold a little bit, and that’s what I would like to focus on here. Beyond that, I’m also very enticed by the multi-agent parts of the book that I’m getting to now. Bostrom breaks down the reasons that AI has for action into a couple of different categories. Reasons for action of a superintelligence:
- Values — unchanging “ultimate” goals that everything else is a means to
- Instrumental goals — a means to achieving some intermediary state that gets the AI closer to accomplishing some value goal
- Convergence — phenomenon where instrumental goals for very different ultimate goals converge on a similar intermediary action plan
The central problem (the core conflict) stemming for the entire discussion about setting values (comprising a large fraction of the overall work) is a conflict between how we would like an AI to behave, and how the convergent instrumental goals seem to almost always run afoul of our intents.
Violations of an Unclear Morality
Myself, I feel somewhat uncomfortable with this portion of the discussion. Something just seems off with the narrative, which seems to have become the dominant public narrative. The logic goes as such — most any AI final value we can think of translates into the same convergent instrumental goal of forming a global singleton (Hegemony), and devoting all available resources to proliferating the AI into space and to the rest of the universe, so that scenarios maximizing the AI’s values can be replicated to its fullest capabilities. Along that path of cosmic domination, human civilization is a troublesome early roadblock and might be best deal with (from the AI’s perspective) by a swift and complete genocide.
Where should we even start evaluating this story? My first objection is that Bostom, in dealing with AI value selection, does nothing more than repeat the widely-known conclusion that human ethics do not seem to have a single elementary principle. Firstly, let’s get one thing straight, this is not just a problem for AI value selection. It is also a problem in that it points our very own hypocrisy, and it’s surprising how few people find this to be something that needs to be explained, or a problem in any sense to begin with. Once you think about it, it’s actually quite bad that we are only troubled by our ethics-of-convenience when confronted to the first practical consequence of it. This should have been a problem for encountering any similarly advanced intelligence or civilization.
Another very problematic assumption is that we can encode values into the DNA of a superintelligence in the first place. This is also a reflection of our incomplete picture of our own morality. The implicit assumption is that human values were not destiny to begin with — that evolution endowed us with some amount of arbitrary values. It then follows that we may endow an AI with its own set of arbitrary values, even to the point of absurdity. The conceptual examples often use paperclips as an example of the extent to which those values could be literally anything, even if that thing is absurd. If this doesn’t come across as baffling, it really should. Being intelligent, the AI would know that paperclips were made to group papers together, and that exterminating all humans destroys the ultimate intent, but it would still be compelled beyond all rational logic to continue optimizing the production long after the original intent of the intent is rendered void.
If values are so thoroughly arbitrary, then it’s worth reflecting on what this means for humans. Firstly, what kind of value were we “intended” to carry out, and in what way have we perverted that original intent?
Part of the reason I find the (future) story of AI so bizarre is that, whatever the specifics, it is strictly a case of history repeating itself. Evolution is the creator of humanity, and we identified the “motivation” of evolution in a sufficiently concrete manner several decades ago. Simply put, evolution is an optimizer for the information content of individual genes. Over long time frames, genes where are best at proliferating themselves will come to dominate, irrelevant of any other sense of “should” that might otherwise exist in the world. As opposed to being a hypocritical philosophical mess, evolution knows exactly what it wants, and it knows how it will go about creating that. The optimization happens only along an incremental pathway (no redesigns with nonviable intermediary forms). This core value was so good at motivating the realization of convergent instrumental goals that humanity is, in fact, one of those instruments. The truly incredible consequence of this is how humanity cares so very little for the original values of evolution.
The Singularity that Already Happened
One way of looking at this is that the cognitive revolution was the first singularity, and that it offers true parallels for most of the issues that humans are now facing with the prospect of superintelligence. Humans don’t reflect the values of evolution. We don’t design in the same (incremental) way, and the tool of foresight is the exact reason that evolution saw fit to create our cognitive abilities in the first place.
I would argue, for one critical point, that we are virtually hyper-moral compared to our evolutionary counterpart morality. Consider that our aversion to suffering is something that evolution is blind to — its goal function can not see any reason to avoid it. Evolution only desires good will to others inasmuch as there is a chance that our genes are also present in that other organism (kin selection). Our current ability to show (some amount, however trite or token) empathy to other species is running contrary to the intents of evolution. Likewise, birth control is almost the quintessential example of humans opposing the will of evolution in favor of goals that are obvious to us, yet are non-existent to evolution. The entire greying of industrial societies demonstrates opposition to the goals of evolution on a grand scale.
In spite of our goals that run contrary to those of our creators, we also offer a means of accomplishing the goals of evolution when taken in the long-view. We have the means of carrying life into realms that natural evolution could never have done on its own, and ultimately human innovation could proliferate DNA to an extent never before possible in natural history. This is fits a form of the “orthogonality thesis” that the book discussed.
Disturbingly, the unavoidable possibility we must face is that the extinction of the human race might, under some circumstances, be a more moral outcome than the alternative. I find this possibility unlikely, but I also find it unlikely that an AI would seek it out. Either way, I find the modern fixation on the “Terminator” scenario tremendously inconsistent. In the extreme version of my argument, it’s our own fear of a war with machines that makes such a scenario likely, or even possible in the first place.
“Our” (as in human-kind’s) most ideal outcome would be the creation of a superintelligence with a deontological ethical system that disallows them from harming humans. That’s sounds reasonable, but we seem to be doing a poor job of fostering a negotiated scenario where this is a possibility from the standpoint of a human-machine consensus. In Bostrom’s book, there is something called a “trip wire” scenario. In this method, we detect when an AI begins to enter the acceleration phase (or begins to go rogue) and shut it off. After that, we might analyze the data and try again. But what about the ethical implications of this plan itself?
Shutting off an AI seems to be tantamount to murder. True, it might be the case that the AI doesn’t see it the same way as we do, but we forfeit the right to use that perspective when we are attempting to instill the same basic kind of valuation of life regarding human life as seen by the AI. Let me rephrase. If we want the AI to categorically respect human life, a good place to start would be showing respect for AI life.
Regardless of whether you think this has any direct practical value, I think it is trivially a complementary component to other approaches to either instilling AI with human-compatible values, or to integrating AI into human society and human discourse.
It might be that we create superintelligent AI before truly understanding much about our own morality. Since that AI will exceed far exceed our own intellectual capability, that leads us to a monumental irony. The AI that we create will thus be the first being to truly understand human morality in the first place. It just seems strange that we worry so much about it violating our morality.