Nano-Empathic Emergence- A Different Kind of Meta-Learning

Published in

Intuition Machine

8 min readDec 30, 2019

“Empathy is a kind of inference that cannot be captured by classical logic” — Carlos E. Perez

Deep Learning researchers like to use the moniker ‘Meta Learning’ as a catch-all expression for “learning how to learn”. I’ve written earlier that what has been described as Meta-learning comes in many different forms. One form is architectural search where parallel experiments are performed to find the best performing learning instance. This form is related to genetic or evolutionary algorithms. A second form involves optimization methods that apply across many tasks, with the intent of learning an algorithm that generalizes across these tasks. A third form involves methods that learn to optimize learning only within a single task (i.e. Learning to learn gradient descent by gradient descent).

The conspicuous commonality among all these methods is the presence of a second-order exploration algorithm.

This reminds me of the conceptualization of second-order cybernetics:

https://en.wikipedia.org/wiki/Second-order_cybernetics

Two cybernetic systems, one with a feedback loop inside another feedback loop. Meta-learning in the field of deep learning seems to look like a regurgitation of this older cybernetic idea.

The primary departure from the cybernetic idea is that deep learning systems are never continuously learning about the environment. Cybernetics assumes a continual learning system, on the other hand, Deep Learning systems don’t have the level of adaptability. It’s insightful to notice that Deep Reinforcement Learning (DRL), which is Deep Learning inside a Reinforcement Learning loop, also fits within this double loop idea. DRL isn’t usually considered as Meta-Learning, but one can make the argument that it should be.

The major obstacle of Meta-Learning is that it implies the need for even greater training data. There is plenty of success of this simplistic prescription of crafting algorithms that benefit from more data. Many advances of Deep Learning are a consequence of brute force computation. Architecture search has discovered simpler and more powerful Deep Learning nodes. Learning across many tasks (i.e. Multi-Task Learning) has been shown to be effective in zero-shot learning (MAML) and in natural language processing (GTP-2). Population-based methods have been shown to be effective in more complex domains like StarCraft 2. The loop within a loop affords an architecture to funnel in more data.

Facebook recently released a PyTorch based project that makes it even easier to formulate architectures that have a loop within a loop:

facebookresearch/higher

higher is a library providing support for higher-order optimization, e.g. through unrolled first-order optimization…

github.com

However, this loop within an inner loop idea appears to me to be uninformative and overly simplistic. The asymmetry of information discovery should be enough of a hint that these methods will lead to diminishing returns.

I will explore here a description of information discovery that is inspired by the methods of human technological innovation. You see, there are meta-level principles that drive human-level innovation. I conjecture that a model of technological innovation can be leveraged towards more efficient learning. This approach is also a meta-level approach but does not simplistically add an outer loop on top of an existing inner loop learning algorithm. This approach formalizes a model of information discovery (or learning) process and uses this as a framework for developing new algorithms.

Let’s explore two meta-level approaches to information discovery. The first one is David Deutsch explanation of the nature of a good explanation. The second one is described by Judea Pearl in his causality calculus.

David Deutsch describes the following:

From David Deutsch: Apart from Universes

We can think of learning as being able to create explanations about the world. The above model generalizes to the activity of performing science. Essentially it is an interactivist model where knowledge discovery is achieved through experimentation and verification. It flows from the expression of formalism, the testing of the predictions of this formalism and the interpretations of this formalism. The benefit of the formalization of an explanation is that it lends to other minds sufficient detail to properly analyze and criticize the model.

What remains non-obvious is how minds generate formalisms. This generative process, I would call ingenuity.

In “The Seven Tools of Causal Inference with Reflections on Machine Learning” Judea Pearl describes another model that I find has similarities to David Deutsch’s model:

This model is a simple input-output model that shows how one would evaluate new formalisms (in this case it is described as inputs in the form of a graphical model) in the context of comparison with the data. The query inputs of this model relate to the testing in Deutsch’s model.

The commonality between Deutsch’s model and Pearl’s model is that both are model-based and not model-free. These two methods require intelligence to formulate a model and formulate the testing of these models. The iterative loop in this discovery methods is that a variety of explanatory models are proposed and these undergo selection by testing their fitness against reality.

But where do these models come from? The model-free models will argue that these models originate from a shapeless primordial soup of a model and that it is through the interaction with the data (i.e. inputs) that something emerges out of the primordial soup. To be fair, model-free models have some initial conditions (i.e. some call it priors but I dislike that word) that rigs (or constrains) the learning process to increase the likelihood of a favorable outcome.

I shall take a divergent path from this model-free approach and instead seek one based on ideas of recombination (see: Recombinant Programming). Rather than seek out the kind of meta-learning that Jeff Clune describes in a recent NeurIPS 2019 lecture on Meta-Learning (I recommend this), I will describe a variation that is inspired more by the models of Deutsch and Pearl.

Let me make this proposal that is a generalization of the meta-learning idea. Meta-learning is a learning strategy of the collective. Value-memes is a learning strategy of a collective. I discuss Value-memes here:

Scaling to AGI via Selves and Conversations

The oddest thing about Artificial Neural Networks is that they actually work despite being based on a completely false…

medium.com

A learning strategy of a collective acknowledges the diversity of the cognitive capabilities of members of the collective. Although each member of the collective may have its own learning strategy: zero-shot, bayesian, hebbian, gradient descent etc. The “outer loop” in this framing is a collective coordination strategy. John Hagel’s learning platforms is an example categorization of the kinds of collective learning. To make it easier to refer to in the future, let’s call these “Learning Platforms”. Here’s a Venn diagram that compares the difference between an outer-loop meta-learning strategy and one that is based on a Learning platform:

An agent may be a participant in multiple v-memes and coordinates with each one in the manner constrained by the v-meme. This leads to a richer formulation of a collective learning platform.

A v-meme is a shared algorithm that spans many agents. But how does that v-meme emerge from a collection of interacting agents? For this emergence to happen, each interaction agent must exhibit a kind of nano-intentional capability that I will call “nano-empathy”. These nano-empathic agents through the interaction with its community are able to recognize regularities in the interactions. These regularities over time become codified implicitly as an instance of a v-meme. The outer loop of this meta-learning algorithm seeks to discover and codify these v-memes. The outer loop is a collective learning algorithm that is shared across all agents but it is an emergent property of the individual agent’s empathy.

What then shall I call this alternative kind of meta-learning? I call this Nano-empathic Emergence. It is a paradigm shift when you think of neurons and cliques of neurons as nano-empathic. Saying that every neuron are Turing complete (i.e. Nano-intentionality) doesn’t go far enough to help us uncover general intelligence. However, saying that every neuron or their cliques are learning machines is sheds some needed light on how to achieve general intelligence.

Not all learning platforms are structured in a manner that encourages innovation. There are platforms that are aligned towards information discovery and there are platforms aligned towards efficient execution. Furthermore, this should not employ that an innovation platform cannot leverage the capabilities of an efficiency-oriented platform. This also reminds me of the evolution dimension in a Wardley map. In the direction of increasing maturity, technology can be in a stage of genesis, custom-built (bespoke), productized and finally in the commodity stage. The kinds of organizations that are involved in the development will depend on the stage a product is in. Here is an example of a Wardley map:

https://blog.gardeviance.org/2015/02/an-introduction-to-wardley-value-chain.html

Different types of organizations (therefore with different v-memes) are involved in the development of a complex product.

So as it is for human technological evolution, I propose that it is the same for the internals of the human brain. We know that human behavior can be partitioned into several kinds of selves (i.e. bodily, volitional, perspectival, narrative and social). We know that the primary purpose of biology is homeostasis. Specifically, the continued maintenance of the integrity of the self. So perhaps each brain component is a specialization of a self. That is, it has organized itself in a manner that best supports the requirements for the self it maintains. So for example, we can have the following:

Bodily Self Homeostasis — Hypothalamus

Volitional Self Homeostasis — Basal Ganglia

Perspectival Self Homeostasis — Thalamus/Cortex

Narrative Self Homeostasis — Hippocampus

Social Self Homeostasis — Cerebellum

The above categorization is speculative and can change as we refine these ideas.

Any advanced technology has an innovation laboratory. Biology (“Any sufficiently advanced technology is indistinguishable from biology”) has bacteria and viruses as its innovation laboratory. Organizations that are tasked with innovation have very different value-memes. Innovation may require the dissolution of self. The creativity center of the brain, it’s innovation laboratory may perhaps not reside in any part of the brain, but rather demands an integration of the whole brain.

Further Reading

Taxonomy of Methods for Deep Meta Learning

Let’s talk about Meta-Learning because this is one confusing topic. I wrote a previous post about Deconstructing…

medium.com

The Computational Boundary of a "Self": Developmental Bioelectricity Drives Multicellularity and…

All epistemic agents physically consist of parts that must somehow comprise an integrated cognitive self. Biological…

www.frontiersin.org

Coarse-graining as a downward causation mechanism

Downward causation is among the most controversial and obscure concepts in evolutionary biology. Simply stated it is…

royalsocietypublishing.or

Stable Opponent Shaping in Differentiable Games

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent…

arxiv.org