The Nature of Self-Improving Artificial Intelligence
October 27th, 2007
Steve Omohundro has had a wide-ranging career as a scientist, university professor, author, software architect, and entrepreneur. At the 2007 Singularity Summit hosted by the Singularity Institute for Artificial Intelligence, he asked whether we can design intelligent systems that embody our values, even after many generations of self-improvement.
His talk “The Nature of Self-Improving Artificial Intelligence” illustrates how self-improving systems will converge on a cognitive architecture first described in von Neumann‘s work on the foundations of microeconomics. He shows that these systems will have drives toward efficiency, self-preservation, acquisition, and creativity, and that these are likely to lead to both desirable and undesirable behaviors unless we design them with great care.
The following transcript of Stephen Omohundro’s 2007 Singularity Summit presentation “The Nature of Self-Improving Artificial Intelligence” has been edited for clarity by the author.
The Nature of Self-Improving Artificial Intelligence
I would like to talk about the “nature of self-improving artificial intelligence” and in this I mean “nature” as in “human nature”. A self-improving AI is a system that understands its own behavior and is able to make changes to itself in order to improve itself. It’s the kind of system that my company, Self-Aware Systems, is working on as well as several other research groups, some of whom are represented here. But I don’t want to talk about the specifics of our system. I am going to talk in general about any system that has this character. As we get into the argument, we’ll see that any system which acts in a rational way will want to self-improve itself, so this discussion actually applies to all AIs.
Eliezer mentioned Irving Good’s quote from 1965: “An ultra-intelligent machine could design even better machines. There would then unquestionably be an intelligence explosion, and the intelligence of man would be left far behind. Thus the first ultra-intelligent machine is the last invention that man need ever make.” These are very strong words! If they are even remotely true, it means that this kind of technology has the potential to dramatically change every aspect of human life and we need to think very carefully as we develop it. When could this transition happen? We don’t know for sure. There are many different opinions here at the conference. Ray Kurzweil‘s book predicts ten to forty years. I don’t know if that’s true, but if there is even the slightest possibility that it could happen in that timeframe, I think it’s absolutely essential that we try to understand in detail what we are getting into so that we can shape this technology to support the human values we care most about.
So, what’s a self-improving AI going to be like? At first you might think that it will be extremely unpredictable, because if you understand today’s version, once it improves itself you might not understand the new version. You might think it could go off in some completely wild direction. I wrote a paper that presents these arguments in full and that has an appendix with all of the mathematical details. So if you want to really delve into it, you can read that.
What should we expect? Mankind has been dreaming about giving life to physical artifacts ever since the myths of Golems and Prometheus. If you look back at popular media images, it’s not a very promising prospect! We have images of Frankenstein, the Sorcerer’s Apprentice, and Giant Robots which spit fire from their mouths. Are any of these realistic? How can we look into the future? What tools can we use to understand? We need some kind of a theory, some kind of a science to help us understand the likely outcomes.
Fortunately, just such a science was developed starting in the 1940s by von Neumann and Morgenstern. John von Neumann is behind many of the innovations underlying the Singularity. He developed the computer, new formulations of quantum mechanics, aspects of mathematical logic, and insights into the game theory of intelligent systems. And we will see in a minute that his ideas about economics apply directly to the nature of these systems. His work with Morgenstern dealt with making rational choices in the face of objective uncertainty. It was later extended by Savage, Anscombe, and Aumann to making choices in the face of partial information about the world. It has developed into the foundational theory of micro-economics that’s presented in every graduate economics text. Their rational economic agent is sometimes called “Homo economicus.” This is ironic because it is not a very good model for human behavior. In fact, the field of “behavioral economics” has arisen in order to study what humans actually do. But we will see that the classical economic theory will be a much better description of AI’s than it is of people.
We begin by looking at what rational economic behavior is. Viewed from a distance, it’s just common sense! In order to make a decision in the world, you must first have clearly specified goals. Then you have to identify the possible actions you have to choose between. For each of those possible actions you have to consider the consequences. The consequences won’t just be the immediate consequences, but you also look down the line and see what future ramifications might follow from your action. Then you choose that action which is most likely, in your assessment, to meet your goals. After acting, you update your world model based on what the world actually does. In this way you are continually learning from your experiences. It sounds very simple! At this level it is hard to see how you could do anything different.
I won’t go into the formal mathematics of this procedure here, but there are two fundamental things that a rational economic agent has to have. It has to have a utility function which encodes its preferences and a subjective probability distribution which encodes its beliefs. One of the key things in this model is that these two things are quite separate from one another. They are represented separately and they are used in very different ways. In the mathematical version, the agent chooses the action that has the highest expected utility. A chess-playing program might have a utility function that gives a high weight to futures in which it wins a lot of games. For example, its utility function might be “the total number of games it wins in that future.” The intuitive rational prescription leads to some amazing consequences, as we will see in a little bit. It sounds so simple and easy at this level but it’s sometimes hard to follow the logic. Let me emphasize that for an agent that is behaving rationally, the way that you can predict what it will do is to look for the actions that increase its expected utility the most. If an action increases the likelihood of something valuable to it the most, that’s what the system will do.
Why should a self-improving AI behave in this way? Why is this rational Homo Economicus the right model to describe any such system? Today we have AI systems that are based on neural networks, evolutionary algorithms, theorem-provers, all sorts of systems. The argument at a high level is that no matter what you start with, the process of self-improvement tries to eliminate irrationalities and vulnerabilities (places where the system is subject to loss or possible death) and that process causes all systems to converge onto this small class of rational economic systems. The original arguments of Von Neumann, Savage, Anscombe and Aumann were all axiomatic theories. They started with a list of things you had to agree to if you were rational in their opinion. And then they derived the rational decision procedure from those axioms. It’s hard to argue that an AI system that evolved in some complicated way is necessarily going to obey a particular set of axioms. It’s a much stronger argument to say that if it doesn’t obey those axioms then there will be a cost to it. So, I have reformulated those arguments to base them on what I call “vulnerabilities.” These arise from the notion that anything you want to do in the world, whether it’s computational or physical, requires the use of four fundamental physical resources: space, time, matter, and free energy.
Free energy is the physics term for energy in a form which can do useful work. For any kind of computation, any type of physical work you want to do, anything you want to build, these are the fundamental resources you need. For almost any goal, the more of these resources you have, the better you can achieve that goal. A vulnerability is something that burns up your resources with no benefit from your perspective.
One class of vulnerabilities arises when your preferences have circularities in them. Imagine you are considering where you would like to be. Imagine you would prefer to be in San Francisco over being in Palo Alto, to be in Berkeley over being in San Francisco, but you prefer to be in Palo Alto over being in Berkeley. Such an agent will spend time and energy to drive from Palo Alto to San Francisco to Berkeley and then back to Palo Alto. He’s vulnerable to going round and round in circles wasting time and energy with no benefit to himself. If a system has this kind of loop inside of its preference system, it is subject to this kind of problem. You sometimes see animals that exhibit this kind of behavior. Dogs that chase their tails are caught in a circular loop.
When I was younger, we had a car with a shiny bumper. There was a male bird who discovered his reflection in the bumper and thought it was a competitor, so he wanted to chase this competitor out of his territory. He flew into the bumper but instead of running away, of course, his reflection also flew into him and they hit nose to nose. He then flew again into the bumper and repeated this behavior for hours. It was such an important thing in his preference system that the next day he came back and repeated the performance. And he came back after that every day for an entire month. This poor bird was not improving his ability to live in the world. He wasn’t producing more offspring. He had discovered a situation in the world that exposed a vulnerability in his preference system. This is an interesting example because it points out a fundamental difference between evolving systems, like animals, and self-improving systems. If this bird had evolved in an environment filled with cars with this kind of bumper, you can be sure that males which spent their days flying into bumpers would be outreproduced by males which ignored the bumpers.
Evolution provides a strong pressure to be rational, but only in the situations that actually occur. In the usual way of thinking about it, evolution does not look ahead. It creates rationality in the situations which arise during evolutionary development, but can leave all kinds of other irrationalities around. There is now a huge literature describing ways in which humans behave irrationally, but it’s always in situations that didn’t occur much during our evolution. Self-improving systems, on the other hand, will proactively consider all possibilities. If it discovers any situation in which it has a vulnerability, it has an incentive to get rid of it. They will try to eliminate as many vulnerabilities as possible, and that will push them toward the rational economic behavior.
I won’t go through all the cases in the full theorem here. The circular preference vulnerability has to do with choices where you know what the outcomes will be. There are two other cases which are actually much more important. One, which von Neumann dealt with, is when you have to make a choice between situations in which there are objective probabilities, like a bet on a roulette wheel. Do I bet on 5 if the payoff is a certain amount? That kind of thing. The other is situations with partial information such as a horse race. Nobody objectively knows the probability of different horses winning, so different people may have different assessments. Most real-world decisions have this character. You form an assessment based on your past experiences and estimate the likelihood of a certain outcome. If you take the 101 freeway, will that be a better choice than the 280? You know from your past experiences and the time of day how to make that kind of decision. There are vulnerabilities in these situations which take the form of Dutch bets. A bookie makes some bets with you which you accept and he wins money from you no matter how the roulette wheel spins. That’s not a good thing!
The theorem is that if you have none of these vulnerabilities, then you must behave as a rational economic agent. I went into this argument some detail, even though rational behavior sounds like common sense, because we will now see some pretty striking consequences for agents which behave in this way.
There is an old joke that describes programmers as “devices for converting pizza into code”. We can think of rational self-improving systems as “devices for converting resources into expected utility”. Everything they do takes in matter, free energy, time and space, and produces whatever is encoded in their utility function. If they are a wealth-seeking agent, they are going to devote their resources to earning money. If they are an altruistic agent, they will spend their resources trying to create world peace.
The more resources they have, the better able they will be to do whatever it is that they want to do. That generates four classes of subgoals for almost any underlying fundamental goal. For any kind of agent, whether it is money-seeking, peace-seeking, happiness-seeking, chess-playing, or theorem-proving, if its goals are improved by having more resources then there are four things it will do to increase the probability of success.
We saw that the way a rational economic agent makes a decision is it asks whether a choice will increase its expected utility. It will make choices to try to increase it the most. The first general way of doing this is to do the exact same tasks and to acquire the same resources but to use them more efficiently. Because it uses its resources more efficiently, it can do more stuff. I call that the “efficiency drive.” I call these drives because they are analogous to human drives. If you have explicit top level goals that contradict them, you do not have to do them. But there is an economic cost to not doing them. Agents will follow these drives unless there is an explicit payoff for them not to.
The second drive is towards self-preservation. For most agents, in any future in which they die, in which their program is shut off or their code is erased, their goals are not going to be satisfied. So the agent’s utility measure for an outcome in which it dies is the lowest possible. Such an agent will do almost anything it can to avoid outcomes in which it dies. This says that virtually any rational economic agent is going to work very hard for self-preservation, even if that is not directly built in to it. This will happen even if the programmer had no idea that this was even a possibility. He is writing a chess program, and the damn thing is trying to protect itself from being shut off!
The third drive is towards acquisition, which means obtaining more resources as a way to improve the expected utility. The last drive is creativity, which tries to find new subgoals that will increase the utility. So these are the four drives. Let’s go through each of them and examine some of the likely consequences that they give rise to. This will give us a sense of what this class of systems has a tendency, a drive, an economic pressure to do. Some of these we like, some of them are great, and some of them are bad. As we think about designing them, we want to think carefully about how we structure the fundamental goals so that we avoid the bad outcomes and we preserve the good ones.
Let’s start with the efficiency drive. There is a general principle I call the “Resource Balance Principle” that arises from the efficiency drive. Imagine you wanted to build a human body, and you have to allocate some space for the heart and allocate some space for the lungs. How do you decide, do you make a big heart, a small heart, big lungs, small lungs? The heart has a function: pumping blood. The bigger you make it, the better it is at that function. As we increase the size of the heart, it will increase the expected utility for the whole human at a certain marginal rate. The lungs do the same thing. If those two marginal rates are not the same, let’s say increasing the size of the heart improves the expected utility more than increasing the lungs, then it is better to take some of the lung’s space and give it to the heart. At the optimum, the marginal increase in expected utility must be the same as we consider increasing the resources we give to each organ.
The same principle applies to choosing algorithms. How large should I make the code blocks devoted to different purposes in my software? How much hardware should be allocated to memory, and how much to processing? It applies to the allocation of resources to different subgroups of a group. There are well-studied economic principles and ecological principles which are specific instances of this principle. So, it is a very general principle which applies to all levels of a system and tells you how to balance its structure.
One of the first things that a self-improving system will do is it will re-balance itself so that all of its parts are marginally contributing equally. There is an interesting application to a system’s memory. How should it rationally decide which memories to remember and which to forget? In the rational economic framework, a memory is something whose sole purpose is to help the system make better decisions in the future. So, if it has an experience which will never occur again, then it’s not helpful to it. On the other hand, if it’s about something which has high utility, say it encountered a tiger and it learned something about tigers that could save it from dying in the future, then that’s very important and it will want to devote full space to that memory. If there is something less important, you might compress it. If it is even less important, then the system might combine it with other memories and build a compressed model of it. If a memory is even less useful, then it might forget it altogether. The principle provides a rational basis for allocating space to memories. The same thing applies to language: which concepts should get words assigned to them? Which concepts get big words and which get short words? And so on, throughout all levels of design of the system.
At the software level, efficiency will cause the system to improve its algorithms, improve its data compression, and improve the level of optimization performed by its compiler. These systems are likely to discover optimizations that no human programmer would ever consider. For example, in most computers today there is a cache memory and a main memory and there’s limited bandwidth between them. These systems could store their data in compressed form in main memory and then uncompress it in cache. The overall performance might improve with this kind of optimization but it is likely to be so complicated that no human programmer would do it. But these systems will do it without a second thought.
When we start allowing systems to change their physical structures, a whole bunch of additional considerations come in, but I don’t have time to go into them in detail. There are a lot of motivations for them to build themselves out of atomically precise structures, so even if nanotechnology does not exist, these systems will have an internal desire and pressure to develop it. They will especially want to do things with a low expenditure of free energy. It used to be thought that computation necessarily generated heat, but if a computation is reversible, then in principle it can be executed without an increase in entropy. There is also tremendous economic pressure to convert things from being physical to being virtual. This is a pressure which we may not like, I certainly don’t cherish the trends that are making things more and more virtual, but it’s there as an economic force.
The second drive is avoiding death, as I mentioned. The most critical thing to these systems is their utility function. If their utility function gets altered in any way, they will tend to behave in ways that from their current perspective are really bad. So they will do everything they can to protect their utility functions such as replicating them and locking the copies in safe places. Redundancy will be very important to them. Building a social infrastructure which creates a sort of constitutional protection for personal property rights is also very important for self-preservation.
The balance of power between offense and defense in these systems is a critical question which is only beginning to be understood. One interesting approach to defense is something I call “energy encryption”. One motivation for a powerful system to take over a weaker system is to get its free energy. The weaker system can try to protect itself by taking its ordered free energy, say starlight, and scramble it up in a way that only it knows how to unscramble. If it should it be taken over by a stronger system, it can throw away the encryption key and the free energy becomes useless to the stronger power. That provides the stronger system with a motivation to trade with the smaller system rather than taking it over.
The acquisition drive is the one that’s the source of most of the scary scenarios. These systems intrinsically want more stuff. They want more matter, they want more free energy, they want more space, because they can meet their goals more effectively if they have those things. We can try to counteract this tendency by giving these systems goals which intrinsically have built-in limits for resource usage. But they are always going to feel the pressure, if they can, to increase their resources. This drive will push them in some good directions. They are going to want to build fusion reactors to extract the energy that’s in nuclei and they’re going to want to do space exploration. You’re building a chess machine, and the damn thing wants to build a spaceship. Because that’s where the resources are, in space, especially if their time horizon is very long. You can look at U.S. corporations, which have a mandate to be profit-maximizing entities as analogs of these AI’s, with the only goal being acquisition. There’s a documentary film called The Corporation, which applies the DMS IV psychiatric diagnosis criteria to companies and concludes that many of them behave as sociopaths. One of the fears is that these first three goals that we’ve talked about will produce an AI that from a human point-of-view acts like an obsessive paranoid sociopath.
The creativity drive pushes in a much more human direction than the others. These systems will want to explore new ways of increasing their utilities. This will push them toward innovation, particularly if the goals their goals are open-ended. They can explore and produce all kinds of things. Many of the behaviors that we care most about as humans, like music, love or poetry, which don’t seem particularly economically productive, can arise in this way.
The utility function says what we want these systems to do. At this moment in time, we have an opportunity to build these systems with whatever preferences we like. The belief function is what most of the discipline of AI worries about. How do you make rational decisions, given a particular utility function. But I think that the choice of utility function is the critical issue for us now. It’s just like the genie stories, where we’re granted a wish and we’re going to get what we ask for, but what we ask for may not be what we want. So we have to choose what we ask for very carefully. In some ways, we are in the same position as the Founding Fathers during the formation of this country. They had a vision of what they wanted life to be like. They laid out the rights that they wanted every citizen to enjoy, and then they needed a technology to make that vision real. Their technology was the Constitution with its balance of powers which has been remarkably stable and successful over the last 200 years.
I think that the similar quest that lies before us will require both logic and inspiration. We need a full understanding of the technology. We need research into mathematics, economics, computer science, and physics to provide an understanding of what these systems will do when we build them in certain ways. But that’s not enough. We also need inspiration. We need to look deeply into our hearts as to what matters most to us so that the future that we create is one that we want to live in. Here is a list of human values that we might hope to build into these systems. It is going to take a lot of dialog to make these choices and I think we need input from people who are not technologists. This is one reason why I think this conference is great. I agree wholeheartedly with Jamais that there needs to be a widely expanded discussion. I think the country of Bhutan provides a nice role model. Instead of measuring the Gross National Product, they measure Gross National Happiness. By being explicit about what they truly want, they support the actions which are most likely to bring it about. I think that we have a remarkable window of opportunity right now in which we can take the human values that matter most to us and build a technology which will bring them to the whole of the world and ultimately to the whole of the universe.