Brief Note on “The Great Chatbot Debate: Do LLMs Really Understand?”
A thing struck me during the debate that I’ll share below. It had to do with what we really mean by the motion “Do LLMs Really Understand?”. The full recording of “The Great Chatbot Debate” is available on the Computer History Museum’s YouTube channel, at this link.
It began when Sébastien Bubeck rightly called out the weirdness of the question at the start of his opening statement, and sort of crystallised a moment later when he pointed out that understanding was “kind of in the eye of the beholder”.
I think this insight is important to latch on to when grappling with the question of LLMs actually understanding the world or some part of it, and the concerns (and opportunities) surrounding anthropomorphising computing systems such as these.
Sébastien concluded his first point by saying:
“I have had this experience myself many times in my research field where I feel like some other researchers maybe don’t really understand what’s going on. But of course they understand! Come on, I mean they are a researcher in this field; of course they understand! …but what I’m trying to say is that there are different levels at which we might decide that somebody or something understands…”
Insight
To state it plainly, “the ability to understand is a property we confer on objects that are apart from us”, usually as part of the process of building a relationship with that thing. If falls within the same realm of affairs as conferring personhood on a thing that is not us. It is a pro-social action that invites the thing that is not us to participate as peers in our lives. It is awarding that thing the weight of responsibility and dignity that we give ourselves.
I appreciated his pragmatic approach to the problem posed by focusing on what these systems are capable of — specifically, what they can do for us individuals, what real world problems they can (help us) solve.
Following his insight and method, a more “proper” question to debate would have been, “Do LLMs convincingly demonstrate understanding?”, because the decision to confer “understanding” — along with all that it implies — on an object is quite distinct from its capacity to demonstrate the ability. More to the point, the decision rests with “the beholder”, and it often comes down to how compelling an argument the object in question makes, which elicits a response.
Emily’s Arguments
In contrast to Sébastien’s take, Emily Bender’s opening argument hinged on two things: firstly, handicapping LLMs by focusing not on the emergent behaviours they exhibit, but on the mechanics behind their function while (reasonably) doing the opposite with respect to humans; and secondly, insisting on the current limitations of LLMs with respect to the amount of information they have to compute on.
I will pick both of these apart.
LLMs are software systems built from the ground up. The underlying math is well understood, and they can be constructed by hand from a blank text file, and progressively grown from just a few parameters to the behemoths under consideration. However, the “magic” of LLMs occurs at scale. It is their emergent properties that inspired Sébastien’s paper, “Sparks of Artificial General Intelligence”, which earned him the invitation to this debate.
Choosing to focus almost entirely on how un-mysterious the mundane mechanics of language modelling is, is sort of missing the point.
Secondly, as capable as the current generation of LLMs are, in spite of the vast volumes of digital information they contain, they lack the sort of embodiment that makes our human perspective unique. But to bank on that is to conflate, to a fault, wealth of experience with the ability to exhibit understanding, and to ignore the potential of current neural network architectures and the big lesson AI researchers have learned.
To pour more ice on her opening statement, LLMs seem to have internalised conceptual representations that are mediated by — but not confined to — the form of language.
In spite of all this, the central theme of Emily’s argument still rings true; that so far, LLMs exhibit behaviour that mimics human understanding without being human. They’re intelligence of a different sort; built by hand and cut off from our natural environment , with a model of the world based entirely on textual data (and no other sensory experience) and “frozen” in weights on which they run inference.
Emily ultimately agrees with the insight Sébastien led with, that understanding is on us; that it is an ability we project onto the thing we are willing to engage with at that level.
Conclusion
Artificial General Intelligence rests within the domain of Human-Computer Interaction.
It is even more clear to me that AGI will not be some future benchmark researchers will achieve. It will be people either admitting that they’ve made something compelling enough to be accepted as such, or dismissing it all as “just fancy computation”.