ChatGPT for Education: Some Cautionary Advice

Casey Kennington
7 min readFeb 9, 2023

--

a wooden chair on a hard floor

I was mindlessly checking my LinkedIn feed one day when I first noticed that ChatGPT was everywhere. People were posting their queries and ChatGPT’s responses and some of the responses were very impressive. As someone who researches in the field that produced ChatGPT — natural language processing and dialogue systems — my purpose here is to explain at a high level how ChatGPT learns, why it’s a big deal, but also give some cautionary advice to educators.

The technology behind ChatGPT and other “Large Language Models” has been around for a few years, but some things make this different than before. OpenAI, the company behind ChatGPT, put it into an ecosystem so it could be easily accessible to the public and many users could use it at once. This was a brilliant move because they likely didn’t know how to monetize it until they saw how people would use it: for generating blog posts, generating usable computer code, answering questions, and writing term papers and essays. Now Microsoft has invested billions in ChatGPT, and sites like Buzzfeed are going to use it to generate new content. Behind the scenes, there is also a bit of a wizard-of-oz operation (similar to Siri and Alexa) where many humans are looking at what people ask ChatGPT and curating some of the answers, sometimes for ethical reasons, sometimes for legal reasons, but probably mostly for optics. How should ChatGPT respond, for example, if you ask it how to hide a dead body, how you can swindle money out of people, or what it thinks of Naziism?

How it is trained is worth knowing, at least at a high level. Sometimes when people read, they come across a word they have never seen before, but based on the context they can get an idea of what it means without looking up its definition. ChatGPT takes this idea and massively scales it up. We train large language models by taking a sentence then we randomly cover one or more of the words, pass the sentence through the model and then we ask the model what it thinks the missing word should be; sort of a game of fill-in-the-blank. We can systematically tell if the model guessed right or wrong and it can learn from the feedback. We can also pass a sentence into the model and ask it what the next word or even full sentence should be. Repeat that hundreds of millions of times using all of the easy-to-find text you can find from the Internet including Wikipedia (so we know the training data is at least factual?) and other sources. Through this training process, the model learns where words should go as it begins to develop what is called a “distributional” approximation of the meaning of words based on how they are used in a textual context. This is a very powerful method that changed the field of natural language processing and AI. We can train a large language model, and then we can take that model and tune it for more specific things like automatic translation and question answering without expensive re-training.

But what does ChatGPT actually know about language? The core of my research is understanding what it means for a computer to understand language. I read a lot of literature from cognitive science, child development, neuroscience, psychology, sociology, human-robot interaction, and of course my own field of natural language processing and spoken dialogue systems. Like many, I tried out ChatGPT and it is much more impressive than the chatbots of yesteryear. Among other things, I asked it: “have you ever seen an apple?” Its answer told me a lot, and I think it is really important for the world to understand: it not only admitted that it had never seen an apple, it has never seen or experienced anything at a physical level, which is what I expected. It’s impressive that it admitted that, but that shows that how it learns is vastly different to how human children (including you) learned their first language. First language learning begins with a rich, embodied, physical interaction with objects and people in the world, then children learn how words are used to denote physical things. Then they learn about abstract ideas (for example, “democracy”), but even a 12 year-old’s vocabulary only consists of about 40% abstract words (Ponari, Norbury, and Vigliocco 2018). ChatGPT effectively assumes that all words are abstract because it learns only an approximation of the meaning of words as they are used in text. It can learn facts from text about objects, like that chairs can be sat in, but not what a chair looks like, what it can be used for, the muscle memory of bending to sit in a chair, or the relief of sitting after standing for a long time, all of which make up what the word chair means to people. When ChatGPT writes words, we, the humans who read the words, insert the meanings because we understand those words and that’s what we do when people talk to us (see also (Harnad 1990; Searle 1980)). What ChatGPT can do really well is learn English grammar. That means it knows where to put words in a sequence, but what it does not understand is that much of the meaningfulness of language lies in the fact that it is about the world, including the physical world (Dahlgren 1976; Bender and Koller 2020). Only recently have we seen how ChatGPT works on known evaluation metrics, and the results are mixed (Bang et al. 2023). Think of it as kind of like a child who can decode text fluently, but has little comprehension of what they are reading.

My caution about ChatGPT is derived from an anecdote I heard a few years ago. The person at Tesla who headed the self driving efforts emphatically stated that self driving cars did not exist. In contrast, Driver Assist was usable, but it was not ready for full self-driving. Driver Assist is reliable enough to be good at its job most of the time, but the small percentage of the time that it didn’t work really mattered, and the stakes were very high because moving vehicles can be dangerous. The real problem was that humans trusted the technology too much and at the wrong times. The same thing is now happening with ChatGPT. It only ‘understands’ an approximation of the meanings of the words it is generating, and the stakes could be high when we assume that what it says must be true or use it for content that will be used to train the next generation of chatbots. We need to be vigilant with something that only learns an approximate meaning from text taken from the Internet, text which may or may not be factual and only represents a fraction of the knowledge needed for holistic linguistic meaning.

But ChatGPT is here and the world is now different. Should schools use ChatGPT? Should they ban it completely? OpenAI does have a model that can detect ChatGPT-generated text that teachers can use to check students’ essays because teachers didn’t have enough to do already (though it doesn’t work very well). We should all learn about things like ChatGPT and maybe even use them for educational purposes, but think about the implications of that. ChatGPT is often compared to a calculator, which seems fitting in some ways; a tool, but students still spend time learning how to do things like long division by hand. Or perhaps it is just another tool like a washing machine that will enable mankind to not worry about mundane things. And what about spell checkers? Those seem to do more good than harm, even for children who are still learning how to spell, right?

ChatGPT is different because what is at stake is different than ever before, and the stakes are indeed potentially high: ChatGPT has the ability to generate text; i.e., it can write and writing is one of the most powerful things a human can learn to do well. Writing synthesizes experience, thinking, expression, and literacy all at once, allowing people to pin down thoughts, offload limited memory to the world for later use, record histories, and communicate across space and time. I have personally written and published a lot of scientific work and I have even written and self published a series of science fiction novels on the topic AI. Yet, even after 40 years of life and many many keystrokes, I feel like I am still just ankle deep into becoming half decent at writing. If ChatGPT is doing the writing for us, then it is also doing some of the important thinking, and that is what sets it (and us) apart from washing machines.

Two goals of education are to teach kids how to think and how to communicate their thoughts through writing. Delegating thinking and writing to a machine that doesn’t truly understand language — but is really good at making fancy sounding grammatical sentences — is probably not ideal for developing students. Half of you probably thought “Isn’t saying grammatically fancy nonsense just what politicians do?” If you thought that, then you are beginning to understand my point.

References

Bang, Yejin, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, et al. 2023. “A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.” arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2302.04023.

Bender, Emily M., and Alexander Koller. 2020. “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data.” In Association for Computational Linguistics, 5185–98.

Dahlgren, Kathleen. 1976. “Referential Semantics.” University of California, Los Angeles.

Harnad, Stevan. 1990. “The Symbol Grounding Problem.” Physica D. Nonlinear Phenomena 42 (1–3): 335–46.

Ponari, Marta, Courtenay Frazier Norbury, and Gabriella Vigliocco. 2018. “Acquisition of Abstract Concepts Is Influenced by Emotional Valence.” Developmental Science 21 (2). https://doi.org/10.1111/desc.12549.

Searle, John R. 1980. “Minds, Brains, and Programs.” The Behavioral and Brain Sciences 3 (03): 417.

--

--