GPT-3: And in the Beginning Was the Word (Part 1/2)

Daniel Leivas
The Startup
Published in
13 min readSep 6, 2020
Photo by Alex Knight on Unsplash

30-Second Summary

  • GPT-3, born in may, creates fear and excitement in the community of developers and digital workers. However, many are expressing their astonishment, and the first wave of powered applications is emerging, like producing human-like texts.
  • It seems to me that simply explaining the principles of Machine Learning, inspired by the brain and which seduces a little more every day by its extraordinary capacities, makes it possible to maintain a critical eye on this astonishing technology.
  • GPT-3 is trained with all data from the Internet. GPT-3 is an unsupervised learning algorithm using Generative Adversarial Network (GAN).
  • The brain has an incredible architecture to comprehend the world. Biological neurons inspire parameters in Machine Learning. The brain and the Artificial Neuronal Networks (ANN) are similar but not identical. The brain is an order of magnitude much more complex than an ANN.
  • GPT-3 passes a specific type of Turing test, but it is not yet as human-like intelligence. One essential thing is missing: emotions.

And In The Beginning Was… GPT-3

May 28, 2020. The Generative Pre-trained Transformer 3 (GPT-3) was officially released in the form of a scientific publication and is in beta testing as of July 2020. It is a natural language processing (NLP) neural network created by OpenAI.

OpenAI is a research lab in artificial intelligence (AI) sponsored by SpaceX and Tesla CEO Elon Musk. GPT-3 makes developers, geeks, and techno-skeptics worldwide fantasize and shudder with the ability to imitate human beings.

Why has this technology captivated people's imagination? Why did some worry it could prove itself to be dangerous? Over the past few weeks, many text samples generated by GPT-3 have started to circulate on social networks, such as developing website layout, translating English to LaTeX equations, answering Excel functions, writing SQL queries, generating SVG graphs, or even creating React app that describes itself.

One application discussed the most is the generation of this article, "GPT-3 may be the biggest thing since bitcoin", which explains how GPT-3 could become the next new disruption. GPT3 itself produced this text! How can a machine do these things? And do they replace us? Is this the beginning of a new era for machine intelligence? Fear and excitement settle in minds.

But fear and excitement have consistently sold more than dreams because our brain, out of survival instinct, records more facts and ideas that can threaten our species. So when something is scary, perhaps the best is to demystify it by trying to understand its inner workings.

Machine Learning Principles

GPT-3 uses deep learning, part of machine learning (ML) methods, to produce a human-like text like translation, spelling correction, and auto-completion of sentences. But also, it can make answers to open questions, generate texts imitating the style of famous authors or code web pages, and even a surprising ability to solve arithmetic problems—all without human supervision.

Machine learning uses data to answer questions, as defined by Yufeng Guo, developer and advocate for Google Cloud Platform.

Data is the fuel

When we go into detail, Machine Learning can be broken down into seven steps, adding an upfront step a hypothesis or a business problem to solve:

(Hypothesis →) Data collection → Data preparation → Choosing ML Model → Model training → Evaluation → Hyper-parameter tuning → Prediction

If we simplify it:

Training data → Model → Prediction (or inference)

How much data do we need to train this model? Millions! As much as we can afford the servers and the computing power to train the model.

For example, if I had a dataset with two variables, age (input) and height (output), I could implement a supervised learning model to predict a person's size based on his age.

What happens when the data or inputs are not of good quality? The model may generate invalid predictions.

Training data → Model → Invalid prediction

In general, we will then seek to quantify invalid predictions made by the model and minimize them. Finally, when a margin of weak predictions considered acceptable has been reached, the learning phase will be deemed to be over, and that the values of the model parameters are optimal.

GPT-3 is trained with all Internet data. All, or almost. In fact, it's a large part of Internet data, saved each month by Common Crawl (an open repository of web crawl data that can be accessed by anyone), which has been used to train this algorithm using the "Transformer" principle invented barely three years ago by the engineers of Google.

This artificial neural network (ANN) is just trained to predict the next word in a gigantic linguistic corpus of several billion sentences, where the biggest encyclopedia in the world, Wikipedia, represents only 3% of this corpus. The rest comes from digitized books and various web links. That means the GPT-3 training data includes not just things like news articles, recipes, and poetry, but also coding manuals, fiction, religious prophecies, and whatever else we can imagine. Any type of text that has been uploaded to the Internet has likely become a useful element for the GPT-3 patterns. And it also includes other bad sources like pseudoscientific books, conspiracy theories…

It is about a method allowing us to contextualize the meaning of each word much more deeply, thanks to considering the position of the word in the sentence. An additional attention mechanism that allows the algorithm to put about distant linguistic units, to make the link between the subject, verb, and direct object in a long sentence or take into consideration the semantic context linking the different sentences of the same paragraph.

Unsupervised learning

In machine learning, the tasks can be classified into two categories: supervised or unsupervised learning problems.

For the first one, the principle is very simple. You start by entering the algorithm of basic explicit criteria called "labeled" data. Then you train the algorithm, and you correct it in case of the wrong answer. You use a teacher, a "supervisor". This is called "supervised learning". After a certain time, the algorithm will have developed categorization criteria by itself. So this method consists of teaching a function to match input to an output based on known examples (input-output pairs).

In contrast to supervised learning, unsupervised learning is a machine learning technique where you do not need to supervise the model. This technique consists of having only input data and no corresponding output variables. The algorithm discovers through "chaotic" and unlabelled data. The goal is to model the underlying structure and distribution in data to learn more about the data. It helps to finds all kinds of unknown patterns in data. This technique allows you to perform more complex tasks and can be more unpredictable compared to other methods.

Generative Pre-trained Transformer is an unsupervised learning algorithm using a generative adversarial network (GAN). The father of the GAN concept, Ian Goodfellow, moved to Apple in March in a director role, was one of the top minds in AI at Google, and named MIT's Innovators Under 35 in 2017.

GAN is a framework in which two neural networks compete to become more accurate in their predictions. The generative model is pitted against an adversary. A generator learns to generate samples. A discriminator learns to distinguish the generated samples from real samples (just a basic true/false in output). The generator is learning to produce more and more realistic samples to trick the discriminator while the discriminator becomes better and better at distinguishing generated samples from real.

We are experiencing more of an important bifurcation in AI than its revolution. The power of our supercomputers allows GANs, through trial and error, to explore solution spaces of almost infinite size. Ok, but is this how the brain really works?

The Marvelous Human Cognitive Architecture

The human counterpart of AI

Artificial neural networks (ANNs) are inspired by the functioning of the human brain. It is mathematical and algorithmic modeling that simulates as close as possible to current knowledge of these computational units that we have by the billions in each of us.

The human brain has many neurons which pass a potential action to an axon, creating a neural synapse. It has billions of neurons that are interconnected via neuronal synapses.

Neurons (or nerve cells) are specialized cells that transmit and receive electrical signals in the body. Neurons are composed of three main parts: dendrites, a cell body, and an axon. Signals are received through the dendrites. This signal travels to the cell body, and continues down the axon until it reaches the synapse. The synapse is the region of interaction between two neurons that allows a signal to pass. A neuron is a computational unit that has 1 or more inputs and a calculated output.

From a computational point of view, an artificial neuron is just a mathematical and computational representation of a biological neuron. The use of a neuron is very limited because it is "binary". It can only separate 2 sets of inputs/outputs. It's great for doing basic things, binary calculus, comparison, memory, "linear" decision but not for more complex problems. That is why it is associated with other neurons to create an ANN. This ANN is organized from one or more layers of neurons. These neural layers are connected to each other, following several different topologies, each layer comprising several neurons.

In ML, a neuron is called a parameter. But a model parameter is not the same as biological a neuron. A biological neuron is not as simple as an artificial neuron, it takes many anatomic forms with several shapes and structures such as Basket, Pyramid, Betz cells, and so on. Based on the functions performed by neurons, they are divided into three basic types; afferent (sensory) neurons, efferent (motor) neurons, and interneurons.

A parameter is a configuration variable. It is internal to the ML model and this value can be estimated from data. Parameters are required by the model when making predictions.

ANNs use a lot of parameters. The parameter is just a value and a weight of relevance. A biological neuron is not either a switch, true or false, 0 or 1. Neurons are multiple neurotransmitters (dopamine, serotonin…) and receptors. A neuron is an order of magnitude more complex than a parameter.

Our brain has the most sophisticated cerebral architecture. This marvelous architecture and the mental chains that it underlies form the foundations of human intelligence. This architecture allows us to abstract. It allows us to comprehend the world and adapt it to our needs. The number of synapses in the brain is known much less precisely, but it is probably about 100 trillion synapses. What sets GPT-3 apart from previous versions is its size. GPT-3 has 175 billion parameters, increasing the model capacity of his predecessor, GPT-2, from 1.5 billion to 175 billion.

Learning speed

When we study a bit of the brain, we figure out that a brain is a much more complex mechanism. We can't compare machine intelligence with the human brain but part of the system is similar.

The brain-like qualities of neural networks are sometimes, on specific tasks, more powerful and more efficient than a human. When developing a neural network, many global parameters are often taken into account to improve accuracy. the optimization function: gradient and its "learning speed", the size of the batch, that is to say, the number of data used to train the network.

Most human learning during decades with teachers, courses, books, friends, mentors... GPT-3 has been trained with all Common Crawl datasets in less than a year!

How close is AI mimicking the human brain?

That helps to understand a little better the closeness level between machine and human intelligence. One of the most spectacular skills of GPT-3 is the success of the famous Turing test. Or rather, the success of a particular version of the Turing test since there are many variations of it today. Can you build an A.I. that can convincingly pass itself off as a person? OpenAI researchers invited more than 600 participants to assess whether short journalistic texts were generated by artificial intelligence or by a human being.

The result is final. The texts generated by the most evolved version of the algorithm are indistinguishable from those generated by real journalists. The results can be technically impressive, and also fun or thought-provoking, like the poems, code, and other experiments attest.

The real question behind this test is not: "is there a difference between humans and machines?". But well: "how far does a simulating artifact (designed, invented, and programmed by human intelligence ) can it be delusional?" Because it is indeed an illusion such as a representation or a simulation, not reality.

In fact, one of the major open problems in AI is to train an AI with fewer data and fewer steps. Poorly trained learning may not seem like a big deal, but it is one of the main open issues in AI. Human beings can learn a new task by being shown only a few times. Lucky for us, kids don't need to see a million car photos before they can reliably recognize cars on their own. This ability to learn complex tasks from just a few examples has so far escaped machines, despite the efforts of researchers. The deep neural networks' data thirst is a major drawback, because for many tasks there is not a lot of data available, and creating new, labeled training sets is expensive. Poorly trained learning, if it worked well, would democratize the use of AI in many more areas than is currently the case.

GPT-3 doesn't solve learning in a few steps, but it opens up an amazing direction of development. If increasing the size of the model so drastically improves the performance of a few shots, then maybe increasing the scale another 1000 times (the difference between GPT-2 and GPT-3) would bring the performance of a few shots. close, more or less superior to the human level. If the scale is really the solution to human intelligence, then GPT-3 is still around 1000 times too small. This assumes that the synaptic connections map roughly one to one with the parameters of the neural network, which of course is not the case since it has been seen that human neurons are more complex than their artificial counterparts.

Intelligences of the heart

It is from this complexity, this marvelous brain architecture, what we call "intelligence" derives. The notion of human intelligence is based on a shared intuition, according to which it is easy to distinguish those individuals whom everyone calls intelligent from those who are much less intelligent. But we can very well speak of the intelligence of plants if we consider that intelligence is an emerging property of evolutionary biology, just as our human intelligence is an emerging property of the chemistry of our neurons.

Human intelligence is simply an emergent property, resulting from the cascade of cerebral, cognitive, genetic, and contextual factors that enables a mental representation of things of reality, an abstraction. Intelligence comes from the Latin "intellegentia" (faculty of understanding). A compound derived from two words from the Latin "intellegere" meaning to understand, and whose prefix inter (between), and the radical "legere" (to choose, to pick) or "ligare" (to bind). Intelligence suggests the ability to connect elements who without it would remain separated. Intelligence, in the richest and deepest sense, is the art of relating.

The capacities and talents can be multiple; one who excels in handling the subtleties of language may use abstract reasoning less well, while another brilliant in mathematics is incapable of managing his daily life. There are many dimensions of intelligence. We must beware of simplistic cuts. It is still necessary to cultivate the intelligences. There are myriads of modes of expression of these intelligences: emotional, practical, spiritual… That's what makes the definition of intelligence so tricky.

These multiple dimensions influence each other. It is perfectly impossible to separate in man his reasoning faculties from his affective faculties. All reasoning is always linked to astonishment, joy, frustration, spite. In short, an emotion.

The idea in the book of Antonio Damasio, Descartes' Error, brings an original vision that the evolutionary process has built neocortical systems of regulation on top of older ones. And how emotions are manifested in close interrelationships between the body and the brain in the perception of objects. In this view, decision making, reasoning, and acute affective responses assist the same purposes-survival as the ancient limbic and endocrine systems.

As we go about our lives, frontal mechanisms create associations between images in the primary sensory cortex and physiological states of the body. The body, then, is an essential part of the system of thought and social judgment. Then, when we have to decide among competing courses of action, images of potential outcomes evoke the physiological states. Damasio specifies how the body provides fundamental content to mental representations. The body constitutes the frame of reference for our representation of the world, of our relationship to it. In turn, physiological states are themselves subliminally perceived, with the result that alternative actions with negative somatic markings are rapidly rejected and those with positive markings receive extra attention. So, the fact of existing would precede that of thinking, contrary to what Cartesian thought indicates.

Here, you can read Part 2, where I have explored how GPT-3 is significant enough to open up real questions.

Learn more about AI on Continuous.lu!

--

--

Daniel Leivas
The Startup

Curious man in a curious world | Entrepreneur | Lifelong Learner | Lecturer | Coach | Trainer | Adviser | Web lover and consultant