GPT-3 and the Nature vs Nurture Debate

Carlos E. Perez
Intuition Machine
Published in
8 min readAug 23, 2020

--

Photo by Isaac Quesada on Unsplash

OpenAI trained an artificial neural network (more specifically a transformer) on a massive corpus of human language text. This network coined GPT-3 (I won’t bother to say what the acronym stands for because it’s likely just going to add noise) is 175 billion parameters in size. The surprising emergent effect of this network is that it is able to perform new predictions with just a few examples provided by the user. OpenAI the creator of GPT-3 explains this as ‘few-shot’ learning. This is sometimes referred to as meta-learning. However this muddies the water as to what is meant by learning. Allow me to explain.

When you use GPT-3, you can give it several examples and it can generate text that follows a pattern implicit in the examples. But is it really, or is it just a kind of constraint satisfaction? I kind of fuzzy deduction? That is, given the examples, what’s the most likely output that matches the examples. So you can give examples of translations between a pair of languages and then it can execute a translation task between these pairs.

For example, if I give GPT-3 a set of English to Latin examples (typically around four), it is able to translate an English example to Latin with surprising accuracy. This is quite a general method that has reach. It can be applied to many other different tasks such as finding analogies, paraphrasing, finding parts of a whole, identifying affordances and connecting disparate concepts. In fact, some of the text of this book is written by a system powered by GPT-3.

One can make the argument that learning is a function of inference. That learning and inference are not separate things. How are humans able to chunk concepts if not for a kind inference? It should be clear that humans will have difficulty learning without the capability of chunking concepts.

This also muddies that water of the nature/nurture debate. I will argue that humans have innate learning capabilities and this reveals itself in several tendencies. The first and most obvious one is that humans spend a disproportionate amount of their time avoiding becoming an adult. The one characteristic of youth is play. We know from AlphaGo how play can self-generate data for learning. Generating data contributes to richer learning experiences. Rich enough that the AlphaZero self-play system is able to best the best human player without ever using recorded human play. That is, it learns to play Go or Chess from scratch.

There is also the preference of humans for shared-intentional behavior. The white of our eyes allow us to signal to others our current attention. We naturally are able to predict the intentions of other humans and as a consequence have a talent for cooperative behavior.

Human innate capabilities is a very long list . This includes the acuity of our fovea, the dexterity of our hands, the ability to mentally extend our bodies to use tools, our ability to mentally time travel, our ability for vocalization, our ability to follow sequences, our ability to tell stories, learn navigation through these stories and walk in a way that is driven by our neocortex. This combined milieu of skills surely drives faster learning. When we speak of innate capabilities, that of talents coming from nature, we should focus on these skills.

Then there are skills that exist as part of cultural evolution. This is where we can bootstrap our thinking using the languages found in our culture. These come from nurture and handed down to us to learn from our societies. Learning these skills are clearly only possible with the innate cognitive skills we have. It isn’t bootstrapped.

Innate cognitive skills we have are only possible when it’s built on the foundation of innate skills that we see in the animal kingdom. After all, evolution has been creating this for nearly 4 billion years. At every scale in biology, organisms must solve a variety of complex problems. However, the umwelt of a single cell animal or the umwelt of our immune cells are very different from the umwelt of our selves as a whole.

What becomes relevant to a cognition as useful energy is very different for a cell in our body as that is revealed to our consciousness. A cell sees ATP as potential energy. In our everyday lives, we see a cup of caffeine as potential energy, a job as potential energy or even a sales call as potential energy.

But it is a common process known as homeostasis that regulates action at every scale in our bodies and at every scale in biology. Homeostasis requires agency and that nature of that agency differs at different scales. One can thus say that evolution has finetuned homeostasis and agency for billions of years. Biology develops through differentiation, we see this in how a single cell becomes a complete human. The potential exists in a single cell that evolved after billions of years of tinkering.

Therefore, all that we see as innate cognitive capability in humans is an innate cognitive capability that exists in single cells. It just took billions of years for that innateness to be expressible. Now because biology innately has this richness of expressibility, it becomes critical that we employ the right tools of interpretation to uncover the mysteries of this expressibility.

Minds enable biological organisms to navigate the uncertainty of reality. Ever since humans have developed reasoning we have been studying the nature of our cognition. The problem of how minds can evolve has been much discussed by philosophers and scientists throughout antiquity. For those interested in history, John Vervaeke’s video lectures on ‘The Meaning Crisis’ is an excellent synthesis of the evolution of our understanding of cognition.

There are many approaches to the study of cognition. In this book we focus on constructive models to explore the nature of cognition. This means that to understand cognition, we try to construct synthetic intelligence. Under this approach, cognitive capabilities are construed to be implementation artifacts. In other words I attempt towards achieving a generative model of intelligence and not a descriptive one. I seek towards creating an implementable model of intelligence and not a conceptualized, but non-implementable model. This stands in stark contrast to current mainstream approaches to cognition which conceptualize descriptive models.

Alternative approaches such as cognitive psychology and neuroscience, aim to give a descriptive model of the human mind at different levels of abstraction. Psychology seeks to describe human behavior independent of underlying biological mechanisms. Neuroscience performs experimental studies on biological brains in hope of discovering their mechanisms. One could further make the observation that the Turing test is a discriminative model of intelligence.

A common approach to cognitive phenomena is to study the cognitive phenomena in isolation from the context. However, a more fundamental approach is to seek a comprehensive view of the cognitive phenomena with reference to the environment. In this approach, the whole is greater than the sum of the parts because the whole influences and is influenced by the parts and vice-versa. Therefore, the study of cognition is less of an inquiry into parts but more of a study into wholes.

Sequence to sequence prediction is a strange way to generalize general intelligence, but it is not anymore as strange as all intelligence being a kind of compression. #ai

Humans have a fondness for unifying abstractions, however, too much focus on the abstraction leads to overlooking the devil in the details. The problem with general intelligence is that it is an emergent phenomenon that occurs at many different levels.

The difficulty in analyzing emergence is that it is opaque to many analytic methods and requires instead the actual execution of a massively interactive collection of agents. The winner of games cannot be determined solely on analytics. The games must be played to see the winners.

Biological evolution has revealed to us one path towards general intelligence. It is from this path where we can formulate models of how capabilities emerge. This bottom-up approach inspired by evolutionary development narrows the search space of possible implementations.

I hypothesize that there are many ways to get to general intelligence, but the specific kind that works in our world is the kind that evolved from this world. This is because general intelligence is ultimately coupled to fitness with the environment.

But there indeed still a problem with this approach. Not everything that evolution creates is necessary. Evolution creates Rube Goldberg machines and this one reason why understanding biology may be hopelessly impenetrable.

At best we can learn the general principles of biology and employ these principles to growing artificial general intelligence. That is, to focus on mining principles from an emergence perspective rather than a reductionist perspective.

One of the reasons that Semiotics, Cybernetics and Enactivism have not gained as much traction is the lack of computational capability confined these fields to thinking that were abstract without any means for verifiability.

It is no different from the world of neural networks decades ago. Theoretical soundness takes a back seat from demonstratable experiments. But now we do have the computational resources and thus we must revisit older ideas.

It is a false assumption to make that older ideas have been assimilated to present-day thinking. Lack of innovation is not solely due to not having the knowledge. It is also due to being unaware of knowledge that already exists.

If you think about this, what better way to get a return on investment than mining old knowledge instead of having to create new knowledge? There are many great thinkers of the past who spent countless days (free from internet distraction) contemplating the nature of cognition.

Today we benefit with knowledge about their ideas that they may have been unable to validate (due to technological limitations). This knowledge is very powerful in that it can prune their thought experiments to discover the useful ones.

I read the story of Norbert Wiener (Cybernetics). He was way ahead of his time in his understanding of automation. The concerns we have today about automation and jobs, he wrote about in 1950. He was however uninvited from the famous Dartmouth conference on AI.

This is because he had not embraced the emerging notion of digital computation. He understood cognition from the perspective of analog computation. AI believed that intelligence must come from logic and this digital computation.

A century later, with Deep Learning we discovered the value of feedback loops and analog computation. Models of the brain have swung back to analog models, the same kind of models Cybernetics studied.

I hypothesize this preference for continuous models has its limits. In fact, we have to go back further into the past before Cybernetics to re-discover Semiotics.

This is because cognition is more than feedback, cognition is about conversation and conversation is only possible through signs and thus language. The weakness of other models of general intelligence is the absence of a semiotic explanation.

gum.co/empathy

--

--