Let’s Set the Record Straight on LLM Capabilities
In the rush to adopt AI, engineering achievements such as Large Language Models (LLMs) are too often mistaken for human linguistic competence.
By Irving Wladawsky-Berger
“Mistaking the impressive engineering achievements of LLMs for the mastering of human language, language understanding, and linguistic acts has dire implications for various forms of social participation, human agency, justice and policies surrounding them,” according to the Irish cognitive scientists Abeba Birhane and Marek McGann in a recent paper, “ Large models of what? Mistaking engineering achievements for human linguistic agency.”
“Hyperbolic claims surrounding LLMs often (mis)use terms that are naturally applied to the experiences, capabilities, and characteristics of human beings.”
LLMs have impressive abilities to respond to and generate natural language based on their training. They use massive language data sets generally sourced from the World Wide Web to break down text or speech into tokens, typically a few characters in length, to develop a statistical model of language. Powerful statistical techniques and lots of computational power are then used to analyze the relationship between billions of tokens in order to generate grammatically valid sequences of token concatenations in response to a question or prompt.
Separating Facts From Fiction
Birhane and McGann note that “the processing of datasets and the generation of output are engineering problems, word prediction or sequence extension grounded in the underlying distribution of previously processed text. The generated text need not necessarily adhere to ‘facts’ in the real world,” which is why LLMs are prone to hallucinations, a response generated by its algorithms that, while statistically and grammatically correct, is actually false or misleading real-world information.
The researchers add that claims regarding the linguistic capabilities of LLMs are based on two unfounded assumptions:
· Language completeness, which assumes that there exists a distinct and complete thing called a natural language, whose behavior can be effectively modeled by a sophisticated engineering system like an LLM, as is the case with real world physical objects and systems; and
· Data completeness, which assumes that the essential characteristics of natural language can be identified, and quantified in the data sets used to train the LLM algorithms, as is the case with engineering models of real world systems.
Both of these assumptions about human language are based on a computational theory of mind, which views the human mind as an information processing systems. In this view, the human mind is an internal representation of a corresponding external reality which provides the foundation for cognition — e.g., thinking, learning, problem solving — as a kind of computation.
3 Ways They Differ
Instead, the authors approach cognition from an enactive perspective. In the enactive view of cognition, “Cognition is action-related and action-oriented, with the capacity to generate environmental structure by action.” Birhane and McGann identify three characteristics of enacted language that are absent in LLMs but fundamental to human language: embodiment, participation, and precariousness:
- Embodiment motivates human language and engages us with the world in concrete ways that greatly influence our actions. Tone of voice, gesture, eye contact, emotional context, facial expressions, touch, location, and setting are among the factors that influence what is said or written. Our very personhood is intertwined with our interactions with others.
- Participation involves the social, active, and collaborative aspects of language that cannot possibly be captures in a static representation of training data. It involves the casual chit-chats as we interact with people, the fleeting gestures, body languages, tones, pauses, and hesitations — that cannot be entirely captured in text, are often unpredictable, and have no clear formal rules.
- Precariousness is the idea that the coordination between two or more people engaged in a shared activity are often full of ambiguities, tensions, and frictions which are not necessarily a bad thing. These frictions help us achieve a common understanding and resolve disagreements. They’re at the very heart of being human.
It’s Meaning that Matters
The authors emphasize that meaning is what truly matters in the enactive view of language, referencing a 2020 paper by linguistic professors Emily Bender and Alexander Koller which explained the difference between form, meaning, and understanding:
· Form is any observable expression of language, whether written, spoken, or signed;
· Meaning is the relation between the form in which the language is expressed and the communicative intent it’s being used to evoke in the listener or reader; and
· Understanding is the listener’s ability to capture the meaning that the speaker intends to convey.
Bender and Koller wrote that while the success of LLMs on many natural language tasks is very exciting, “these successes sometimes lead to hype in which these models are being described as understanding language or capturing meaning.”
However, “a system trained only on form has apriori no way to learn meaning.” Research on language acquisition has found that “human language learning is not only grounded in the physical world around us, but also in interaction with other people in that world… Human children do not learn meaning from form alone and we should not expect machines to do so either.”
Enactive cognition is also discussed in a 2023 research paper, “Dissociating Language and Thought in Large Language Models: a Cognitive Perspective.” The paper points out that there is a tight relationship between language and thought in humans. We generally view other people’s statements not just as a reflection of their linguistic skills, but as a window into their mind. When we hear or read a sentence, we typically assume that it was produced by a rational person based on their real world knowledge, critical thinking, and reasoning abilities.
A Lack of Understanding
The paper adds that LLMs can generate language that rivals human output, which has led to claims that LLMs represent a major step towards the development of human-like AI. Given that until very recently our language interactions have only been with other humans, it’s not surprising that we’re now ascribing human-like properties to these novel machine interactions.
LLMs are actually not so good at thinking because they lack the world knowledge, common sense, and reasoning abilities of humans. LLMs don’t really understand what they’re generating or saying the way a human would, which in the end is the purpose of language.
The paper defines two kinds of linguistic competences:
Formal linguistic competence — “a set of core, specific capacities required to produce and comprehend a given language.” These include knowing a language vocabulary, the grammatical rules to form correct sentences, and the many exceptions to these rules and idiosyncratic language constructions.
Functional linguistic competence — “non-language-specific cognitive functions that are required when we use language in real-world circumstances.” These include problem solving, quantitative thinking, and logical analysis; common knowledge about how things generally work, including facts, concepts, and ideas; assumptions about human behavior that we generally share with other people; and understanding the social context of conversations. “Real-life language use requires integrating language into a broader cognitive framework.”
The distinction between formal and functional linguistic competence comes from our understanding of the functional architecture of the human brain. Research in cognitive science and neuroscience has established that in the human brain “the machinery dedicated to processing language is separate from the machinery responsible for memory, reasoning, and social skills.”
Language Processing
“The many failures of LLMs on non-linguistic tasks do not undermine them as good models of language processing. After all, the set of areas that support language processing in the human brain also cannot do math, solve logical problems, or even track the meaning of a story across multiple paragraphs.”
“An enactive cognitive science perspective makes salient the extent to which language is not just verbal or textual but depends on the mutual engagement of those involved in the interaction,” wrote Birhane and McGann in conclusion. “The dynamism and agency of human languaging means that language itself is always partial and incomplete. … The data on which the engineering of LLMs depends can never be complete, partly because some of it doesn’t leave traces in text or utterances, and partly because language itself is never complete.”
“Large language models signify an extraordinary engineering achievement and a technological revolution like we have not seen before. However, they are tools — developed, used, and controlled by humans — that aid human linguistic interaction. …
Like all socially consequential technologies, LLMs need to be rigorously evaluated prior to deployment, particularly to assess and mitigate their tendency to simplify language, encode societal stereotypes and the systems of power and privilege underlying them, and the disproportionate benefit and harm their deployment and deployment brings.”
This blog first appeared October 10 here.