The initial excitement of the promise of solving intelligence via brute force gradient descent (i.e. Deep Learning) has hit a plateau. Researchers are beginning to realize that it is insufficient to solely rely on mathematical methods. Rather there is new motivation to focus again on the only known proof of general intelligence that exists. Researchers are again now studying how the human brain works to gain inspiration for how an Artificial General Intelligence (AGI) may be constructed.
The traditional fields that explored thought, philosophy, cognitive science, and neuroscience, share a common weakness. These fields have rarely been able to produce demonstrable models of cognition. All the literature are predominantly about conjectures of how the brain might work, but with little evidence (in the form of a construction of a simulation) that validates their conjectures. To compound to the lack of knowledge, research, particularly in neuroscience, tends to be fragmented and not holistic. What I mean here is that neuroscience research reveals functioning of a sub-circuit of the brain but rarely a model of the whole brain. A researcher is left with a smorgasbord of ideas with very little signaling as to which of the many ideas can lead only to indigestion.
So, at its present incarnation, the research for ‘solving intelligence’ consists of cherry-picking a few interesting empirical evidence discovered in the human brain, composing these together in a simulation (via code and artificial neural networks) and revealing an improvement in the performance of task completion. This is the current state of the art.
The majority of researchers and the bulk of paper submissions focus solely on incremental improvements of existing architectures. They tweak existing architectures by adding layers, designing new kinds of layers, testing different search algorithms and exploring different kinds of loss functions and regularizations. This approach fundamentally lacks any fundamental principles and is bordering on alchemy. These researchers are seeking the magical incantation (in the form of a mathematical expression) that will lead to magical SOTA results. Unfortunately, most gains are minuscule, and perhaps suspect in that one can never tell if the results were cherry-picked from brute force hyper-parameter optimization (neural architecture search in today’s lingo).
I misspoke, there is actually a foundational principle as to why a neural architecture search that explores the massive combinatorial space of neural architecture design is essential. The fundamental principle may simply be that there is no systematic algorithm to discover the cognitive components. Blind-luck (just like evolution) may be realistically the only tool we have in our possession. If you think about it, evolution itself has not explored exhaustively the entire space of what is possible. There’s simply isn’t enough time in the universe to do so. Evolution ultimately relies on chance. I explore this in greater detail elsewhere.
The folks at DeepMind always seem to be several steps ahead of everybody. Let me draw your attention to three papers that they recently released in the pre-print Arxiv server.
The first one is “Optimizing Agent Behavior over Long Time Scales by Transporting Value”. Where the collaborators explore the human capability of using memories from the very far past as input to solving a current problem. A new kind of reinforcement learning is developed that credits actions from the past to the solution of problems in the present.
They call this Temporal Value Transport, it is a “heuristic algorithm” that the authors believe is universal. Specifically, the stages of past events are encoded, stored, retrieved, and re-evaluated. TVT synthesizes memory and reinforcement learning where memories influence the reward credited to past events.
The second paper from DeepMind is “One-shot High-fidelity Imitation.” The gist of this paper is that they’ve created the largest ever Reinforcement Learning system to achieve one-shot imitation of novel skills. (See: Demos)
MetaMimic employs the largest neural network trained via RL, and works from vision, without the need of expert actions. The one-shot imitation policy can generalize to unseen trajectories and can mimic them closely. Bootstrapping on imitation experiences, the task policy can quickly outperform the demonstrator, and is competitive with methods that receive privileged information.
The argument that is being made here is that through the accumulation of massive experience (stored in memory), a system becomes quicker (i.e. one-shot) in imitating new behavior and additionally can do it better than the original demonstration. This is a complement to the TVT approach in that rather than exploring rewards from experience; the system develops a kind of learning strategy that is enhanced by multiple past experiences.
This approach makes intuitive sense in that, learning many related skills are detrimental (due to skill interference) to one’s performance but rather somehow improves performance across all skills. A musician that plays the piano, a wind instrument and a violin may learn skills that are beyond that of a musician playing only the piano. There is something in human intuition that allows the synergistic combination of skills to improve performance in a single skill.
A third paper titled “Episodic Curiosity through Reachability” explores the intriguing connection between topologically connected regions and curiosity. In this paper, the authors propose a new curiosity method that leverages episodic memory to calculate a novelty reward. The current observation is compared against observations in memory to measure novelty. In the real world, rewards are sparse. Animals seeking food may travel great distances without any reward. Therefore, a curiosity method that only attempts to maximize novelty (or surprise) is incomplete.
The new method gives a reward based on the amount of effort required to reach a novel setting. That is an intuitive analogy of ‘no pain, no gain’ applied to curiosity search.
To illustrate, the system provides greater reward for moves that are ‘far from memory’. Above, the nodes in blue are in memory. The nodes in green are a few steps away from memory and thus marked as not novel. However, there are steps far enough from memory that these could be marked as being novel. This curiosity method avoids the ‘couch potato’ effect where surprise and novelty can be achieved with little effort. A key mechanism for this approach is the need for a similarity function that can recognize what is in memory and what is not. You can find demonstrations of this paper here:
ICLR 2019 submission
Rewards are sparse in the real world and most today’s reinforcement learning algorithms struggle with such sparsity…
One of the ironic conclusions to take from these investigations is that the results are being proposed to inspire models in neuroscience, psychology and behavioral economics. This is a trend that is emerging and we should all pay significant attention towards. Although ANNs are not exactly identical to biological brains, we are using these simulations (that are argued to be at a more abstract algorithmic level see: Marr ) to serve as computational evidence for other cognitive fields. We have essentially gone full circle. Cognitive mechanisms are used to inspire ANN architectures that generate computational proof that leads to evidence to support cognitive theories.