Bypassing the Agent Alignment Problem via Intrinsic Motivation

Carlos E. Perez
Intuition Machine
Published in
8 min readDec 4, 2018

--

Photo by christopher lemercier on Unsplash

One curious coincidence is that the phrase “Deep Learning” is used in two apparently different fields of study. The one that I am familiar with comes from Machine Learning and Artificial Intelligence. The one that I’m clueless about coming from the field of education (it’s been sometimes referred to as Deeper Learning). Here’s the Wikipedia entry for the latter:

I do find it ironic that I perhaps might indeed learn something important in “Deeper Learning” that may be useful in my research in Deep Learning. The reason why is the main problem of AI, both from the immediate pragmatic importance all the way to the long term existential problem involves what is known as the AI Alignment problem. The AI Alignment problem appears to be reminiscent of teaching.

Machine Learning and its evolution into Deep Learning is software technology where its behavior is trained rather than programmed (as in conventional software). It has been remarked that in the future, you would train DL like you would train a dog:

“The dog trainers alive now might be the programmers of the future,” Karmann says. “They know the techniques. I realized how many similarities there are between dog training and machine learning.”

A recent paper from Deep Mind (Scalable agent alignment via reward modeling: a research direction) surveys the AI alignment problem in more depth. Jan Leike et. al. enumerate five challenges in AI alignment:

1 Amount of feedback

2 Feedback distribution

3 Reward hacking

4 Unacceptable outcomes

5 Reward-result gap

They propose several methods that could be used to tackle each of these challenges. Their solution to the problem revolves around reward modeling. That is, “learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning.” The problem, as I’ve alluded to before, is that we can’t assume that it’s feasible to discover a reward signal required to solve a problem.

Any teacher can relate to how difficult it is to teach a student that has little motivation. Many times, no amount of reward signaling can lead to a motivated student. I want to explore the flips side of rewards, that is intrinsic motivation. How do we program in motivation so that our artificial agents will learn in the absence of rewards? This is a bottom-up approach rather than the top-down approach as usually prescribed in Deep Learning practice.

Alex Graves in a recent talk at NeurIPS 2018 locates intrinsic motivation a the intersection of active learning and learning without a teacher:

https://www.facebook.com/nipsfoundation/videos/795861577420073/

This classification should be considered a continuum. So there can be semi-supervised learning and learning with sparse rewards. There is fully active learning and learning; they are a mixture such as “curriculum learning.” These two dimensions essentially divide a student’s behavior and the capability of a teacher or environment to provide learning cues. Graves remarks that the key element of General Intelligence is not the ability to learn lots of things, but to rapidly adapt to new things and new situations. This is not a new idea; the word Cybernetics comes from the Greek word meaning “the art of steering.” This is to capture the richness in cognition in its pursuit of goals, predictions, actions, feedback, and responses in many environments.

There is an idea known as Ecological Rationality that is put forward by Gerd Gigerenzer, that is the rationality of agent is dependent on the environment, for which it is placed. Gigerenzer’s interesting idea is that when a greater uncertainty is found in an environment, then the more effective heuristics (or bounded rationality) will be. That is, if an agent can learn to amortize inference effectively, then it will be more effective in dealing with uncertainty. Framed from this perspective, Unsupervised Learning is senseless in that is no environment to speak of. There is no universal measure to capture that will be independent of the task. There are no universal patterns that may be useful, and not all bits are equal.

Ben Recht explains this even better in that Unsupervised Learning is the easiest learning task one can think of because there is no target goal. You can conjure up any kind of measure or classification that you can imagine. The only real question is whether an unsupervised measure or classification is of any value to a task:

unsupervised learning is by far the easiest of the three types of machine learning problems because the stakes are so low. If all you need to do is summarize, there is no wrong answer. Whether or not your bedrooms are rendered correctly by a GAN has no impact on anything. Descriptive analytics and unsupervised learning lean more on aesthetics and less on concrete targets.

This brings into mind Yann LeCun’s analogy of intelligence as a cake narrative. It isn’t a very good one because it focuses on attention on the wrong problem in machine learning. Allow me to propose a better narrative:

Unsupervised learning is easy, just gather enough regularities (i.e., the usual kind, that is, similarity, sequence, proximity, etc.) in your data and use these as embeddings at a higher level. The best example of this can be found in NLP (See: Elmo and BERT). Supervised learning is the bread and butter of Deep Learning; essentially the networks have enough capacity to amortize most kinds of inference. What is missing, however, is the ability to build better abstractions, this is despite the layered nature of DL networks. Deep Reinforcement Learning has been proven to be extremely effective in narrow domains with high certainty such a Go. However, reward shaping is a significant problem. Then there’s continual learning, which is learning to achieve goals without a teacher (See: Cybernetics).

I’ve used the term Continual Learning in the same way as Satinder Sigh. Continual Learning transcends reinforcement learning because there isn’t a teacher involved. So the agent must discover ‘what to learn,’ it is autonomous learning, and it is self-motivated learning. Ideally, continual learning learns new skills and knowledge. The hope is that by learning new skills or knowledge, the agent’s performance continues to improve. Said differently, an agent through intrinsic motivation can learn better representations that are reusable in solving other problems.

The difference between Reinforcement learning is that the reward function is decoupled into an external reward and an internal reward. The external reward is specific to the task being solved. The internal reward is independent of the task and is the internal motivator of the agent. According to Gerd Gigerenzer Ecological Rationality, an agent has bounded rationality, and this is the primary reason why an environments reward signal or teacher’s teaching signal should be decoupled from an agent’s internal reward.

http://www.cs.cornell.edu/~helou/IMRL.pdf

The advantage of discovering good intrinsic motivation is that it mitigates the difficulty of AI alignment challenges intrinsic in explicit rewards.

Now that we’ve decoupled rewards, we have to ask how do internal rewards differ from external rewards. Alex Graves gives a few examples in his lecture. He cites Juergen Schmidhuber in that intrinsic rewards should drive towards compression of one’s knowledge of the world:

Seek out data that maximise the decrease in bits of everything the agent has ever observed. In other words find (or create) the thing that makes the most sense of the agent’s life so far.

Graves also cites Klyubin et al. in “Empowerment: a universal agent-centric measure of control” where motivation is characterized as empowerment. It relates to an agent’s need to maximize the mutual information between its actions and the consequence of its actions:

Instead of curiosity, [the] agent can be motivated by empowerment: attempt to maximize the Mutual Information between the agent’s actions and the consequences of its actions (e.g. the state the actions will lead to). Agent wants to have as much control as possible over its future.

He also cites this paper by Gregor et al., that attempts to maximize the number of different states an agent can reliably reach, as measured by the mutual information between the set of options and option termination states. This option strategy reminds me of Alex Wissner-Gross’ maximization of future freedom (see: An Equation of Intelligence).

There is no consensus yet as to what the best intrinsic motivator. I’m inclined towards the idea that intrinsic motivation is what drives towards improving abstract representations. If we were to examine human personalities, we can discern that personalities are a kind of intrinsic motivation that has its origins from different kinds of self. For each self, personalities are a knob that regulates a person’s preference for an exploitive versus an exploratory behavior. Human Big Five personality is hereditary, and this is related to a person’s preferred knowledge discovery strategy (i.e., Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Each personality category corresponds to one kind of self (Perspective, Volitional, Social, Narrative, Bodily). It would be difficult to justify an intrinsic motivator that is decoupled from the sense of self.

It is indeed surprising to discover that personality is related to learning strategy. This makes perfect sense, humans predominantly employ intuitive cognition (i.e., System 1, thinking fast) and thus behave based on cognitive biases. These biases are developed throughout one’s lifetime through experiential learning. So, as an example, if a person has a personality leaning towards extraversion, then throughout one’s life a person will develop experiences that reinforce extroverted behavior and thinking (absent of course environment negative reinforcement).

Credit: Kieran Healy

Considering the asymmetric cost of information discovery, intrinsic motivation should drive towards reducing the causal uncertainty gap. Causal uncertainty is precisely what is discovered to enable artificial ingenuity. That is, there is a gestalt driven motivator that synthesizes observations into another actionable conclusion (not necessarily a whole). This motivator is enabled by an actionable representation that is a generative language that is a catalyst for further knowledge synthesis. It’s a perpetual loop of knowledge discovery and generation.

♡ Please heart if you like this!

Further Reading

Exploit Deep Learning: The Deep Learning AI Playbook

--

--