ChatGPT hype in the Washington Post

Emily M. Bender
10 min readDec 16, 2022

--

The Washington Post generally has better than average tech coverage, I’ve found (less filled with hype, more likely to focus on the way in which tech companies wield power), but their December 10th piece on ChatGPT is a study in AI hype. In fact, so much of it is hype, I’m struggling with how to write this blog post while sticking to fair-use quotes. I encourage you to open the WaPo article and follow along as you read my commentary.

Photo of a red flag on the beaach with stormy seas in the background.

The piece starts opens with a heart-warming story of a clever use of GPT-3 for paraphrasing from sketchy notes to “business style” emails. But then in 🚩 #1 jumps from that specific technology to effectively claiming that ChatGPT is the stuff of science fiction:

A machine that talks like a person has long been a science fiction fantasy, and in the decades since the first chatbot was created, in 1966, developers have worked to build an AI that normal people could use to communicate with and understand the world. Now, with the explosion of text-generating systems like GPT-3 and a newer version released last week, ChatGPT, the idea is closer than ever to reality.

GPT-3 is able to mimic different styles of text, and is being deployed in a useful way. (The article doesn’t say, but I sure hope the small business owner using it was proofing the emails before they got sent!) But this is only “talks like a person” in a very narrow sense and is not “communicate with and understand the world”.

Next paragraph, 🚩 #2 quoting a tech investor (i.e. someone who definitely is selling something and will profit from public credulity about tech) as calling it “magic”:

“It feels very much like magic,” said Rohit Krishnan, a tech investor in London.

Then two paragraphs trumpeting “great strides in AI-generated text tools”, OpenAI making this “available to the masses”, and ChatGPT “remembering what was said earlier, explaining and elaborating on its answers, apologizing when it gets things wrong.” (🚩 #3)

Immediately followed 🚩 #4, quoting someone (OpenAI’s CTO) who likens the system to a “kid”:

“Essentially it’s learning like a kid. … You get something wrong, you don’t get rewarded for it. If you get something right, you get rewarded for it. So you get attuned to do more of the right thing.”

What a grossly reductive description of how young people learn!

Then we get a brief summary of all of the hype that OpenAI managed to generate by proxy as people played with the tool, with pointers to various examples, before moving on to 🚩 #5, again quoting tech execs and VCs (those folks who are selling something) on their vision for the future:

Some tech executives and venture capitalists contend that these systems could form the foundation for the next phase of the web, perhaps even rendering Google’s search engine obsolete by answering questions directly, rather than returning a list of links.

This idea has been floated before, quite notably by Google. Chirag Shah and I break down why it’s a terrible idea in our CHIIR 2022 paper “Situating Search”. (Public-facing presentations: MIT Tech Review, UW News, IAI News)

Finally, ~700 words into the piece, the WaPo does reference the core problems with these tools (the spew biased output and they make shit up), and quote Arvind Narayanan:

It can still be a powerful tool for tasks where the truth is irrelevant, like writing fiction, or where it is easy to check the bot’s work, Narayanan said. But in other scenarios, he added, it mostly ends up being “the greatest b — -s — — er ever.”

… but these points are introduced as “worries” and are buried deep in the article, after which they get right back to the hype, under the heading “Truth and Hallucination”. They give litany of “AI tools designed to tackle creative pursuits with humanlike precision” and praise ChatGPT for the “uncanny inventiveness of its prose”.

Side note: “Hallucination” is a terrible term here, and I wish people would stop using it. Among other things, it suggests that ChatGPT (or whatever system is to hand) *experiences* and *perceives things*.

The WaPo article returns to text paraphrasing uses, before more quotes from tech evangelists (this time a professor) prognosticating about their imagined future (🚩 #6):

But tools like ChatGPT have helped people see for themselves how capable the AI has become, said Percy Liang, a Stanford computer science professor and director of the Center for Research on Foundation Models.

“In the future I think any sort of act of creation, whether it be making PowerPoint slides or writing emails or drawing or coding, will be assisted” by this type of AI, he said. “They are able to do a lot and alleviate some of the tedium.”

The phrasing “how capable the AI has become” is particularly 🚩. First, “the AI” suggests a thinking autonomous entity. It’s not. Better terms are SALAMI and mathy math. Second, “how capable” is too vague and leaves open the idea that there is a general capability here, rather than just manipulation of word forms. But even aside from those points, where’s the journalistic distance here? Where’s the push back on grandiose claims about “any sort of act of creation”?

They return to the core with trying to use this tech for many of the proposed purposes, but it’s framed as “trade-offs” (🚩 #7) not as a fundamental mismatch between the tech and the proposed tasks (like replacing a search engine):

ChatGPT, though, comes with trade-offs. It often lapses into strange tangents, hallucinating vivid but nonsensical answers with little grounding in reality. The AI has been found to confidently rattle off false answers about basic math, physics and measurement; in one viral example, the chatbot kept contradicting itself about whether a fish was a mammal, even as the human tried to walk it through how to check its work.

They do include a mention of StackOverflow’s very sensible reaction to ban ChatGPT-generated answers for because of their “high rate of being incorrect” — but then they jump right back to the hype:

But for all of the AI’s flaws, it is quickly catching on. ChatGPT is already popular at the University of Waterloo in Ontario, said Yash Dani, a software engineering student who noticed classmates talking about the AI in Discord groups. […] “I’ve noticed a lot of students are opting to use ChatGPT over a Google search or even asking their professors!” said Dani.

The WaPo article then continues with a long story (several paragraphs) of someone using ChatGPT to draft a way to break it to their 6 year old that Santa is made up. Ending with:

She has not shown her son the letter yet, but she has started experimenting with other ways to parent with the AI’s help, including using the DALL-E image-generation tool to illustrate the characters in her daughter’s bedtime stories. She likened the AI-text tool to picking out a Hallmark card — a way for someone to express emotions they might not be able to put words to themselves.

As a use case, this seems relatively safe (someone searching for ideas on how to say something, in a situation where they would be able to vet the words for appropriateness and truthfulness). But also, not earth shattering. And the Hallmark card analogy is particularly apt: ChatGPT’s output is frequently anodyne.

Then, under the heading “May occasionally produce harm” the WaPo article briefly explains how ChatGPT was trained and continues to be trained by using beta testers as free labor (“Anyone using ChatGPT can click a “thumbs down” button to tell the system it got something wrong.”)

Then a quoted assertion from OpenAI’s CTO that this worked (no supporting evidence, just their claim; 🚩 #8), followed by a false claim by an AI researcher of “understanding” (🚩 #9):

Murati said that technique has helped reduce the number of bogus claims and off-color responses. Laura Ruis, an AI researcher at University College London, said human feedback also seems to have helped ChatGPT better interpret sentences that convey something other than their literal meaning, a critical element for more humanlike chats. For example, if someone was asked, “Did you leave fingerprints?” and responded, “I wore gloves,” the system would understand that meant “no.”

Okay and finally we get to the second core issue: the way these systems pick up biases from their garbage training data and are prone to spit it right back out:

But because the base model was trained on internet data, researchers have warned it can also emulate the sexist, racist and otherwise bigoted speech found on the web, reinforcing prejudice.

OpenAI has installed filters that restrict what answers the AI can give, and ChatGPT has been programmed to tell people it “may occasionally produce harmful instructions or biased content.”

Of course, those filters are flimsy. But rather than reporting on this as a core flaw of the technology, WaPo tells us that “some people” have found ways to bypass them.

Some people have found tricks to bypass those filters and expose the underlying biases, including by asking for forbidden answers to be conveyed as poems or computer code.

Excellent quote from Deb Raji, I’m sorry it’s buried so deep into this article:

Deb Raji, an AI researcher and fellow at the tech company Mozilla, said companies like OpenAI have sometimes abdicated their responsibility for the things their creations say, even though they chose the data on which the system was trained. “They kind of treat it like a kid that they raised or a teenager that just learned a swear word at school: ‘We did not teach it that. We have no idea where that came from!’” Raji said.

And some good reflections from Steven Piantadosi:

Steven Piantadosi, a cognitive science professor at the University of California at Berkeley, found examples in which ChatGPT gave openly prejudiced answers, including that White people have more valuable brains and that the lives of young Black children are not worth saving.

“There’s a large reward for having a flashy new application, people get excited about it … but the companies working on this haven’t dedicated enough energy to the problems,” he said. “It really requires a rethinking of the architecture. [The AI] has to have the right underlying representations. You don’t want something that’s biased to have this superficial layer covering up the biased things it actually believes.”

They quote OpenAI CEOs Sam Altman’s response to Piantadosi’s very damning exposé of the kind of trash the system will output: “please hit the thumbs down on these and help us improve!” … with no journalistic commentary on that exchange. Maybe to some of the WaPo’s audience it is clear that this is not a serious nor appropriate response (to me, Altman comes off as a clown) but I’d rather this wasn’t left behind the lines.

This is followed by wo more paragraphs of breathless AI-hype quoting AI boosters about what they imagine is coming next.

“Some have argued that the cases that go viral on social media are outliers and not reflective of how the systems will actually be used in the real world. But AI boosters expect we are only seeing the beginning of what the tool can do. “Our techniques available for exploring [the AI] are very juvenile,” wrote Jack Clark, an AI expert and former spokesman for OpenAI, in a newsletter last month. “What about all the capabilities we don’t know about?”

Krishnan, the tech investor, said he is already seeing a wave of start-ups built around potential applications of large language models, such as helping academics digest scientific studies and helping small businesses write up personalized marketing campaigns. Today’s limitations, he argued, should not obscure the possibility that future versions of tools like ChatGPT could one day become like the word processor, integral to everyday digital life.”

Aside from treating the prognostications of people who, again, are selling something, as news (🚩 #10), WaPo here is platforming mysticism about AI (“What about all the capabilities we don’t know about?”; 🚩 #11) and unfounded rhapsodizing about imagined future technology (“the possibility that future versions of tools like ChatGPT could one day…”; 🚩 #12).

They provide a quote from Mar Hicks, another good one, but not enough to undo the platformed hype of the previous paragraphs:

The breathless reactions to ChatGPT remind Mar Hicks, a historian of technology at the Illinois Institute of Technology, of the furor that greeted ELIZA, a pathbreaking 1960s chatbot that adopted the language of psychotherapy to generate plausible-sounding responses to users’ queries. ELIZA’s developer, Joseph Weizenbaum, was “aghast” that people were interacting with his little experiment as if it were a real psychotherapist. “People are always waiting for something to be dazzled by,” Hicks said.

This is followed by three paragraphs about people worrying about students using this system to “cheat” (aka waste everyone’s time by turning in essays they didn’t write), ending with the hyperbolic, criti-hype statement:

It is like there’s “this hand grenade rolling down the hallway toward everything” we know about teaching, he said.

(What a narrow view of what teaching is! The ability to cheaply synthesize convincing but ungrounded text is a problem, but I don’t think students using it to cheat is the core example of that.)

Next, nauseatingly, they report on someone supposedly “asking” ChatGPT for its “opinion” (🚩 #13):

ChatGPT itself has even shown something resembling self-doubt: After one professor asked about the moral case for building an AI that students could use to cheat, the system responded that it was “generally not ethical to build technology that could be used for cheating, even if that was not the intended use case.”

That’s not self-doubt. That’s just another example of a language model doing what it is built to do: output a probable sequence of words, given some input. It’s not news and it’s not worth anyone’s time to read.

… and then a few final paragraphs about the use case from the beginning.

I counted 13 🚩s in this one, and that’s probably being kind. I’d love to see more journalism that digs into the issues that folks like Deb Raji and Mar Hicks are surfacing. Where is the critical look at how tech companies evade responsibility for the way they build systems that can do harm at scale? Where is the analysis of the effects of people being dazzled by technology, how the people who stand to make money of this am up that dazzle, and how the media itself so often refuses to look behind the curtain?

--

--

Emily M. Bender

Professor, Linguistics, University of Washington// Faculty Director, Professional MS Program in Computational Linguistics (CLMS) faculty.washington.edu/ebender