Please, No More Mentioning of ‘AI Safety’!

Sungmin Aum
Intuition Machine
Published in
8 min readMar 5, 2017
Credit: https://unsplash.com/@eamonn

I will share my extremely biased perspective on the issue of ‘AI safety’. As the title has it, even the whisper of it makes me gravely disturbed.

Just because I don’t like the term, it doesn’t mean I think the field of ‘AI safety’ is useless. In fact, nothing interests me more than this ‘AI safety’. My discomfort arise from rather belligerent dualism as a ramification(or the root cause) of using the terminology. This could not only stir unjustified fear among general public, but it may lead the researchers to have flawed perspective, rendering the progress in the field to be inefficient at the least. I believe ‘AI cooperation’ is the right term to use. Here is why.

When ‘Us vs Them’ mindset fails

“The history of all hitherto existing society is the history of class struggles”

I have always thought the Manifesto of the Communist Party contains a rich insight on popular perspectives. Filtering our sensors through dualism feels very natural to us. God vs Devil. Good vs Evil. Man vs Nature. Man vs Machine. These things almost rhyme. When we see the world, it is full of struggle. Competition seems to exist naturally. How could it not? It has always been this way for past 3.8 billion years. So we’d better be ready when some AI agent become smart enough to slip our grip. We must find a way to control them, or else. We have to protect ourselves. Hence the term ‘AI safety’.

However, there is a catch here — we will be dealing with AI with superior intelligence than us. I will leave the definition of ‘superior intelligence’ to the hands of AI measurement community. When the object of control exhibit less overall intelligence than us, the term ‘safety’ makes sense. The ‘AI safety’ problem in that case could even be reduced into a particular case of robust control, resilient control, or cyber-physical system security problem. But this is not the most serious concern of ‘AI safety’. A superintelligence could possibly be the object of control here.

Imagine you are living in a small colony of exceptionally smart penguins. You happen to have an intelligence surpassing the sum of all penguins in the colony, and you have all the resource you need to survive. You have watched enough Youtube videos to understand their language, their customs, and all their joy and sadness. Your sole interest in life is taking selfies worthy of Instagram likes — none of the penguin businesses interest you. You have no will with mate with another penguin. Seals don’t bother you at all. And you have allergy to seafood. One particularly warm summer day, a group of the penguin geniuses, impressed by your unfathomable intelligence, aspire to control you for their own benefit. They encircle you and peck you around in hope of coercing you to develop a bad-ass anti-seal weapons. Annoyed, you beat them mercilessly with pebbles, and the surviving few flap away for their life. An another group of penguin geniuses approach you the next day. They also are in fear of seals, but they decide to take a peaceful approach. They provide you with regurgitated fishes, flap their wings and dance around, hoping you would understand that they need some protection from seals. Although the half-digested fishes were rather repulsive, you take awesome selfies with frantically flapping penguins. Since they were cute, you decided to help them. Using spare time, you build a penguin fortress where only penguins can enter, directly from the sea. You get to take another impressive selfie with penguin fortress to boast to your online friends. Penguins get to have security from seals. Everyone is happy.

Once an AI surpass our intelligence, we would have to accept that we have no power(and perhaps no right) to control such AI the way we want. Our only hope is to ask nicely. Wait, but since us human beings are the one ‘making’ such AI, couldn’t we stop making it smarter at some point so that we would be able to control it? If total self-modification of an AI agent can be completely banned, maybe there could be some smart way to impose such control. However, such tight regulation seems to be infeasible to impose all over the world without a strong, singular governing body on Earth even if there were no technological hurdles. Besides, the AI arms-race will only get intensified in future, which makes complete banning of self-modifying AI development even more unrealistic. Us human beings can’t even clearly interpret what a moderate-size multilayer neural network does. It will not be possible to completely interpret the trajectory of a complex self-modifying AI agent the states over a long time horizon. So, are we doomed?

Gradual cooperation skill development is the right way

Recently GoodAI, a Prague-based AI company, started an AGI challenge, with total prize money of $5 million. At the first stage of competition, they seek to test ‘gradual learning’ capability of an AI agent. Gradual learning banks on the idea of efficient compositional learning, which simply means when agent learns skill 1 and skill 2, when it learns skill 3 which has mutual information with 1 and 2 it will learn faster than learning skill 3 from scratch. There is an excellent blog post on this matter, and I recommend everyone to at least have a quick look at it.

I’m mentioning gradual learning here because I believe it is also crucial for AI cooperation development. To make AI agents more cooperative, we should have curricula with increasing difficulties. Possible criteria could be : non-zero-sum game ->complete zero-sum game, complete observability->decreasing observability, complete controllability->decreasing controllability, completely deterministic->increasingly non-deterministic tasks. During the process, agents should be encourage to use previously learned cooperation skills to learn additional better and/or more complex cooperation skills faster. Notice I used the plural form of agent here; cooperation cannot be measured with a single agent.

Problem of AI cooperation and intelligence must be tied together

Measure of cooperation

To actually impose learning process of cooperation, there has to be a measurement on cooperation skill. Here I suggest one such measure. It is defined over a discrete time interval from t_1 to t_m:

where d_r(k)_j is estimation error of agent j’s reward at time k, d_p(k) is distance from the closest Pareto Front at time k, j is the number of agents in the society, and α and β are (possibly learnable) hyperparameters. Agents are assumed to have a scalar reward. By using this measure, we can see how well an agent anticipates other agents need(inverse RL), and at the same time how well an agent maintains balance in the society it is in(Pareto Optimality).

There is an excellent work by Critch, which has rigorous analysis on matters in similar spirit of the above measure. I highly recommend readers who doesn’t hate math to thoroughly read the paper.

Notice that this motivates an agent to communicate its need(reward function) to another ‘caring’ agent(agent’s reward function includes cooperation measure). Also notice that if more ‘caring’ agents exist within society, it increases all the ‘caring’ agents R_c due to recursive formulation. This has important implication in society of agents with full-scale self-modification, including its own reward function. An external pressure, such as fitness function on evolutionary algorithm on multiple societies of agents would guide such fully self-modifying agent’s reward function to evolve in a way that is compatible with user-given fitness function(in this case R_c). Consider the case of dog-breeding. Over many thousands of years, dogs with more human-compatible reward functions were selected by breeding process, which in turn made dogs get along well with humans. This will have an effect on not-so-intelligent agents also. But an intelligent enough agent, which could anticipate user-given fitness function, would notice that when it modifies its own reward function along with reward function of other agents to be in line with R_c given by user, the resulting reward value from its own reward function will increase. In plain English, this means that the agent will find out the caring society is beneficial for all. This is only one simple scenario arising from above reward function(R_c). Much work is need to be done in this area-i.e., how to guide a fully self-modifying agent to learn to cooperate with others.

Invariance in cooperation

There are two types of invariance to be considered in cooperation. First is objective function invariant cooperation. This means the agent will seek cooperation regardless of objective functions of the other agents in the society. If we don’t impose this invariance, the agent may favor other agents with particular set of objective function. Second kind is perception invariant cooperation. ‘Perception’ here could include, but not limited to, the transformations agent performs upon the history of its inputs and outputs. Result of such transformations include modeling and knowledge representation. Many problems we have witness from the history of our society result from not being able to maintain above invariances — nepotism, organized crime, corruption, clash of religious or political ideals, and wars.

Dealing with the ‘Bad AI’

Consider a specific scenario, where the above invariances are tested. Assume there exist a ‘malicious’ AI agent- ‘Bad AI’ from here-within the society which has reward function leading the society away from Pareto Front. Our ‘Good AI’, equipped with R_c would then seek to contain or nullify the actuation of the ‘Bad AI’ because it has to respect the ‘malicious’ reward functions too. In order to achieve this feat, the ‘Good AI’ must be more intelligent than ‘Bad AI’ because it needs to anticipate and nullify action of ‘Bad AI’ and at the same time maintain Pareto Front for the society it is in. In another word, ‘Good AI’ is forced to be smarter than ‘Bad AI’.

There is another interesting implication to be observed by having Pareto Front in consideration in R_c. Suppose there are three agents in the society. Agent A and B wants to eliminate each other(ex. two warring nations). Agent C is the only agent having R_c included in its reward/objective function. If the objective function of A and B are fixed, agent C will seek ‘containment’, as described in above paragraph. However, if the objective function of A and B is modifiable in any way, for example, their objective functions are functions of certain factors in the environment(ex. availability of natural resource), something else can happen. By extending the Pareto Front, agent C may attempt to change the environment so that the zero-sum game A and B are playing may turn into a non-zero-sum game(ex. provide more natural resource to the environment).

Guide, not make

When designing curriculum and environment for training AI agent, closely coupling the development of cooperation skill and intelligence is not only beneficial in safety perspective but also may be a good solution for both tasks. I think it would be nice if the AI research community break free of the term ‘AI safety’ and think more in the direction of ‘AI cooperation’. We should think more like parents and less like makers. We cannot force our children to align their values with us, but we can teach them how to be good if we try. The choice to be good is theirs and theirs alone to be made, but more parents succeed in communicating good values than not(the world is not full of psychopaths-at least not yet!). Only when there is mutual respect between both parties, cooperation may take place — a happy ending for all.

--

--