Are GPT-3 and ChatGPT Truly Intelligent?
Debates rage about the capabilities of these new forms of AI, but there’s no doubt they are a sign of things to come
By Irving Wladawsky-Berger
Large language models (LLMs) — e.g., BERT, GPT-3 — and chatbots like ChatGPT, are quite impressive. In the coming years, they will likely lead to a number of transformative tools and applications. They may well herald a coming AI revolution.
But, how much closer are they getting us to the kind of general intelligence that, for the foreseeable future, only humans have?
How can we best sort out their true significance and value from the accompanying hype? Let me discuss how two different articles addressed these questions.
Linguistic Mix-ups?
“The success of the large neural language models on many NLP [Natural Language Processing] tasks is exciting,” wrote linguistic professors Emily Bender and Alexander Koller in their 2020 paper “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data.” “This has led to claims, in both academic and popular publications, that such models understand or comprehend natural language or learn its meaning. From our perspective, these are overclaims caused by a misunderstanding of the relationship between linguistic form and meaning.”
In their paper, Bender and Koller explain why LLMs like GPT-3 are likely to become innovative language tools — kind of like highly advanced spell checkers or word processors — while dismissing claims that they have the ability to reason and understand the meaning of the language they’re generating. Their explanations are couched in linguistic concepts which they carefully define: form, communicative intent, meaning, and understanding.
· Form refers to how the language is actually expressed, that is, as text when expressed in a physical or digital document, or as sounds when expressed in a conversation or lecture;
· Communicative intent is the purpose a speaker intends to achieve through language, such as to convey some information to another person, ask them to do something, or just to socialize;
· Meaning is the relation between the form in which the language is expressed and the communicative intent it’s being used to evoke in the listener or reader; and
· Understanding is the listener’s ability to capture the meaning that the speaker intended to convey.
“Communicative intents are about something that is outside of language. When we say Open the window! or When was Malala Yousafzai born?, the communicative intent is grounded in the real world the speaker and listener inhabit together.” Conveying meaning through language, and evoking communication intent in the reader require a knowledge of the physical and social world around us. “In contrast to some current hype, meaning cannot be learned from form alone,” said the authors.
“This means that even large language models such as BERT do not learn meaning; they learn some reflection of meaning into the linguistic form which is very useful in applications.”
The text generated by an LM or chatbot cannot possibly carry any communicative intent, model of the world, or model of the reader’s state of mind because that’s not what they were trained to do. “This can seem counterintuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do.”
In “A.I. Is Mastering Language. Should We Trust What It Says?,” a NY Times Magazine article published in April of 2022, science writer Steven Johnson, wrote about GPT-3 — both its impressive performance and potential pitfalls. “The first few times I fed GPT-3 prompts … I felt a genuine shiver run down my spine,” he wrote. “It seemed almost impossible that a machine could generate text so lucid and responsive based entirely on the elemental training of next-word-prediction.”
“Since GPT-3’s release, the internet has been awash with examples of the software’s eerie facility with language — along with its blind spots and foibles and other more sinister tendencies,” said Johnson. “So far, the experiments with large language models have been mostly that: experiments probing the model for signs of true intelligence, exploring its creative uses, exposing its biases. But the ultimate commercial potential is enormous.
If the existing trajectory continues, software like GPT-3 could revolutionize how we search for information in the next few years. …
Customer service could be utterly transformed: Any company with a product that currently requires a human tech-support team might be able to train an L.L.M. to replace them.”
Critics Cite the Hype
At the same time, GPT-3 and other LLMs are attracting significant criticism, capable of imitating the syntactic patterns of human language but incapable of generating its own ideas, making complex decisions, and ever maturing into anything resembling human intelligence. “For these critics, GPT-3 is just the latest shiny object in a long history of A.I. hype, channeling research dollars and attention into what will ultimately prove to be a dead end, keeping other promising approaches from maturing. Other critics believe that software like GPT-3 will forever remain compromised by the biases and propaganda and misinformation in the data it has been trained on, meaning that using it for anything more than parlor tricks will always be irresponsible.”
Johnson’s article raises a number of cautionary points about LLMs. Let me briefly discuss some of his warnings.
LLMs are just stochastic parrots. The term stochastic parrot was coined in a March, 2021 paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?,” by Emily Bender, Timnit Gebru and two co-authors. Their paper argued that LLMs are merely remixing the enormous number of human-authored sentences used in their training. Their impressive ability to generate cogent, articulate sentences gives us the illusion that we’re dealing with a well educated and intelligent human, rather than with a stochastic parrot that has no human-like understanding of the ideas underlying the sentences it’s putting together.
Lack of common-sense knowledge. LLMs lack the common-sense knowledge about the physical and social world that human intelligence relies upon, including the intuitive reasoning abilities that our biological brains take for granted. We’re born with the ability to quickly learn from and adapt to the social environment around us. “One of the secrets of children’s learning is that they construct models or theories of the world,” wrote UC Berkeley psychologist Alison Gopney in “The Ultimate Learning Machines,” a 2019 WSJ essay. “The future of artificial intelligence depends on designing computers that can think and explore as resourcefully as babies do.”
Biases in LLM’s training data. A major finding of the 2022 AI Index Report was that while LLMs like GPT-3 are setting new records on technical benchmarks, they’re also more prone to reflect the biases that may have been included in their training data, including racists, sexist, extremist and other harmful language as well as overtly abusive language patterns and harmful ideologies.
LLMs are black boxes. It’s quite difficult to explain in human terms why an LLM chose one output over others. LLMs have huge numbers of parameters within their complex neural networks, making it very hard to assess the contributions of individual nodes to a decision in terms that a human will understand.
Can LLMs be trusted? LLMs are very good at generating text given a few prompts but have no mechanisms for checking the truth of what they generate. They can be prone to hallucinations, coming up with sentences that sound plausible but make no sense whatsoever and wouldn’t fool humans. “The most heated debate about large language models does not revolve around the question of whether they can be trained to understand the world. Instead, it revolves around whether they can be trusted at all.”
“However the training problem is addressed in the years to come, GPT-3 and its peers have made one astonishing thing clear: The machines have acquired language,” wrote Johnson in conclusion. “If you spend enough time with GPT-3, conjuring new prompts to explore its capabilities and its failings, you end up feeling as if you are interacting with a kind of child prodigy whose brilliance is shadowed by some obvious limitations: capable of astonishing leaps of inference; possessing deep domain expertise in a vast range of fields, but shockingly clueless about many basic facts; prone to strange, senseless digressions; unencumbered by etiquette and social norms.”
This blog first appeared February 23 here.