Language could be an important part of artificial general intelligence and synthetic consciousness
In talking with friends about machine intelligence, I realized that I’ve been growing a web of beliefs over the past few years that are not necessarily all common knowledge. In this article, I’m going to try and map those out, as succinctly and clearly as possible.
History has shown that making predictions is laughably ill-fated, marring experts and fools alike. In that spirit, I’m also writing this for the fun of looking back on it in years to come.
First of all, a couple of definitions (these terms tend towards quite broad and vague definitions, I’ve narrowed them down for this article):
- “General intelligence”, short for artificial general intelligence (AGI) is defined as a system that can manipulate information (e.g. sensory data and stored knowledge) in as sophisticated and diverse ways as a human can. It is a “human like” intelligence. Current machine learning systems display “narrow intelligence”, they do well at a particular task, but do not generalize from it. They struggle with the information diversity of the real world and they struggle to perform multiple different tasks.
- “Consciousness” is being aware of your own mind, knowing you exist and being (partially) aware of your own thinking
- “Dynamic computation” is defined as a neural analogy to logical algorithms. The steps taken and the number of steps will vary based on the input data. This is in contrast to a typical neural network where the architecture and computation steps is fixed (e.g a feed-forward network). Recurring neural networks start to break this constraint, although in practice are often run for a fixed number of steps and do not compute anything akin to an algorithm (Neural Turing Machines learn algorithms, but the difficulty of training them has prevented their adoption in industry).
- “Instructions” and “instruction sets” is a term borrowed from CPUs — an instruction is the next small step in a program, telling the CPU what to perform (E.g. add register A to register B and store the answer in register C)
Following are a series of beliefs I hold. Many of them are driven by an observation of human cognition combined with looking at the capabilities of our state-of-the-art neural network architectures.
This regrettably is not a work of scholarly research; I would very much like to research this more thoroughly at a later point.
Language is the primary means of dynamic computation in humans
The mammalian brain is capable of learning reactionary responses, moods, memory, but not thinking and language. Those came together with the development of the neocortex.
When humans need to develop their thoughts, they turn to language: talking it through with a colleague, journalling, debating, talking to themselves, reading, writing articles. These many different avenues for language (internal monologue, speaking, listening, reading, writing) are all facilitating the same process: transforming inner information into linguistic form, refining it through some process, re-injesting it.
This highlights an interesting limitation: humans typically make more intellectual progress when putting down thoughts into an external medium than when talking to themselves (audibly or inaudibly). I believe this illustrates the limitations of our working memory. When we write out our ideas on a piece of paper and read it back, we can analyze a greater volume of information for internal consistency and synthesis more new ideas from that.
Language is a convenient mechanism for dynamic computation
Just as instruction sets on CPUs freed computers up to do vastly more than pocket calculators could, I believe that language can open up machine learning systems to compute more complex functions, with more generality and flexibility.
Language has a number of helpful properties for this:
- A small number of tokens (e.g. 26 letters, 44 phonemes) combined in a linear stream supports a practically infinite range of words, ideas and statements.
- Complex thoughts are transformed into a fixed size input-output channel (e.g. the writing of a character, pronouncement of a phoneme) with little loss of expression. Since this I/O is performed over time, the mechanism for processing the language is fixed size next-step function (like an RNN).
- Thanks to the open-endedness of combining tokens into words and thoughts, one mechanism can process a vast diversity of statements.
- Self-talk / conversations / writing then reading all harness external memory to perform iterative and recursive operations. This offloads a key piece of computational architecture (e.g. the call stack in a CPU) to the external environment. This also allows us to augment our computational abilities in new ways (e.g. humans can leverage spreadsheets to do computation they otherwise could not easily do).
- By generating language in response to language (e.g. reading then writing, conversation, self-talk) an algorithm is being executed, where the data and instructions are intertwined into the same flow of information.
- Token-based language (e.g. combining letters into words) provides an open-ended instruction set. As language evolves in a population, the new words and phrases we create are rapidly transmitted between individuals. These new words/phrases give us means to express and transform information in new ways. In this sense “The limits of my language mean the limits of my world”, but thankfully, we endlessly move the limits of our language.
Language supports machine reasoning
Within the AI community machine reasoning is roughly defined as a system learning to transform information in a sufficiently complex manner. Whilst I’ve not found a good definition to pin down “sufficiently complex”, it’s generally used to refer to systems that incorporate multiple pieces of information in a context dependent manner (e.g. in response to a question, extract facts from vision and incorporate those together).
Human-style reasoning often exhibits recursively breaking down a problem, applying learned sub-routines (e.g. to drive a car from X to Y, first turn the corner, then drive along the one way street…), being able to handle a wide diversity of problems, being able to handle problems of varying complexity (e.g. asking a student to write down their name, then to write an essay about Free Will), and being able to apply newly learned rules (e.g. instruct a student they cannot use their left foot, then ask them to play football).
These operations fit the prior definition of dynamic computation. As argued in the previous section, language is a good medium for this due to its flexibility, it can readily adapt to new challenges (e.g. writing can be used to iteratively draft an essay, to write out the lines of a mathematical proof, to send news to others, or to make an accounting ledger)
The core of language processing is association
For language to be valuable, linguistic statements must be exchangeable for concepts and memories in our minds (somewhat similar to fiat currencies being as valuable as what they can be traded for).
I argue that the foundation of language processing is the association between words, statements and concepts. Humans’ comfort with metaphors directly reflects how we process language. This is why we have so many words that mean similar things, for us every piece of language is defined by it’s great many connections across our mind.
The associative nature of languages is convenient as it gives a flexible and direct connection to our other components (e.g. memory, feelings).
In this sense, I believe that Transformer architectures bear a useful similarity to our language centers. Transformers are incredibly capable at learning vast and nuanced connections between pieces of language. It is an open question whether with enough parameters they will display a handling of concepts and coherency in their language generation (however, they are being put to work generating language in a fashion that would trip up a human writer; few of us can continuously write coherent essays without re-reading and iteratively editing).
Language’s associative nature naturally gives rise to consciousness
Pieces of language associate to other pieces of language. This gives rise to reference, and makes language self-referential. Further than that, the broad and vast associations, combined with the open-endedness of language whereby we can invent new words and phrases to express and relate to what we wish, means that our language can refer to our own process of language generation, ourselves, our minds, and our beliefs.
In practice, we are sufficiently motived to often refer to ourselves, minds and process. It should be noted that this mechanism provides no assurance of the validity of statements we make about ourselves, simply that we’re capable of making them.
Consciousness has a long history of struggles to define it. Despite the difficulty in definition, many philosophers believe that there is a broadly shared underlying intuition about what consciousness is. I’m going to throw a simplistic definition into these troubled waters: Language’s ability to self-refer gives rise to the experience and phenomena of consciousness.
We should include language in our attempts to create artificial general intelligence (AGI)
This is the conclusion I draw from the previous statements. Language is both very powerful (being used for everything from arts to accountancy) and also has desirable computational properties (fixed architecture can handle broad diversity of problems, standardized I/O, can leverage different types of external memory).
Whilst our current deep networks are not capable enough to provide AGI today, the latest language models show increasing promise. Harnessed such that language can provide control flow for the intelligent system, and using self-talk for reasoning and recursion, these approaches may start to open doors to systems with vastly more generality and capability than today’s narrow intelligence systems.
Finally, a caveat: just as birds and planes achieve flight in very different ways, it’s likely that there will be routes to general intelligence very different from those nature has taken.