On NYT Magazine on AI: Resist the Urge to be Impressed

Context

tl;dr

On asking the right questions

Buying into the hype

GPT-3 has been trained to write Hollywood scripts and compose nonfiction in the style of Gay Talese’s New Journalism classic ‘‘Frank Sinatra Has a Cold.’’

Others have fed the software prompts that generate patently offensive or delusional responses, showcasing the limitations of the model and its potential for harm if adopted widely in its current state.

So far, the experiments with large language models have been mostly that: experiments probing the model for signs of true intelligence, exploring its creative uses, exposing its biases.

Tweet from Ilya Sutskever on Feb 9, 2022 reading “it may be that today’s large neural networks are slightly conscious”

there was a sense that the long ‘‘A.I. winter,’’ the decades in which the field failed to live up to its early hype, was finally beginning to thaw.

But GPT-3’s intelligence, if intelligence is the right word for it, comes from the bottom up: through the elemental act of next-word prediction.

On the side of emergent intelligence, a few points are worth making. First, large language models have been making steady improvements, year after year, on standardized reading comprehension tests.

“You’ll see that it puts the arms and the legs in the right place,” Murati points out. ‘‘And there’s a tutu, and it’s walking the dog just like it was a human, even though it’s a baby radish. It shows you that GPT-3 really has quite a good conception of all the things that you were asking it to combine.’’

In a way, you can think of GPT-3 as a purely linguistic version of the Cartesian brain in a vat or in a ‘‘Matrix’’-style cocoon:

GPT-3 and its peers have made one astonishing thing clear: The machines have acquired language.

Perhaps the game of predict-the-next-word is what children unconsciously play when they are acquiring language themselves: listening to what initially seems to be a random stream of phonemes from the adults around them, gradually detecting patterns in that stream and testing those hypotheses by anticipating words as they are spoken. Perhaps that game is the initial scaffolding beneath all the complex forms of thinking that language makes possible.

All that glitters is not gold

GPT-3 has been trained to write Hollywood scripts and compose nonfiction in the style of Gay Talese’s New Journalism classic ‘‘Frank Sinatra Has a Cold.’’

For instance, even without the kind of targeted training that OpenAI employed to create Codex, GPT-3 can already generate sophisticated legal documents, like licensing agreements or leases.

… GPT-3’s recent track record suggests that other, more elite professions may be ripe for disruption. A few months after GPT-3 went online, the OpenAI team discovered that the neural net had developed surprisingly effective skills at writing computer software, even though the training data had not deliberately included examples of code.

If you gave 100 high school students the same prompt, I doubt you would get more than a handful of papers that exceeded GPT-3’s attempt. And of course, GPT-3 wrote its version of the essay in half a second.

OpenAI hagiography

‘‘Here is the underlying idea of GPT-3,’’ Sutskever said intently, leaning forward in his chair. He has an intriguing way of answering questions: a few false starts — ‘‘I can give you a description that almost matches the one you asked for’’ — interrupted by long, contemplative pauses, as though he were mapping out the entire response in advance.

And once again, all the evidence suggested that this power was going to be controlled by a few Silicon Valley megacorporations.
The agenda for the dinner on Sand Hill Road that July night was nothing if not ambitious: figuring out the best way to steer A.I. research toward the most positive outcome possible,

Today, roughly a fifth of the organization is focused full time on what it calls ‘‘safety’’ and ‘‘alignment’’ (that is, aligning the technology with humanity’s interests)

But beyond the charter itself, and the deliberate speed bumps and prohibitions established by its safety team, OpenAI has not detailed in any concrete way who exactly will get to define what it means for A.I. to ‘‘benefit humanity as a whole.’’ Right now, those decisions are going to be made by the executives and the board of OpenAI — a group of people who, however admirable their intentions may be, are not even a representative sample of San Francisco, much less humanity. Up close, the focus on safety and experimenting ‘‘when the stakes are very low’’ is laudable. But from a distance, it’s hard not to see the organization as the same small cadre of Silicon Valley superheroes pulling the levers of tech revolution without wider consent, just as they have for the last few waves of innovation.

LLMs aren’t inevitable and the “wider web” isn’t representative

L.L.M.s have even more troubling propensities as well: They can deploy openly racist language; they can spew conspiratorial misinformation; when asked for basic health or safety information they can offer up life-threatening advice. All those failures stem from one inescapable fact: To get a large enough data set to make an L.L.M. work, you need to scrape the wider web. And the wider web is, sadly, a representative picture of our collective mental state as a species right now, which continues to be plagued by bias, misinformation and other toxins.

Not just an academic debate

If the existing trajectory continues, software like GPT-3 could revolutionize how we search for information in the next few years. […] if the GPT-3 true believers are correct, in the near future you’ll just ask an L.L.M. the question and get the answer fed back to you, cogently and accurately.

This is not just an esoteric debate. If you can use next-word-prediction to train a machine to express complex thoughts or summarize dense material, then we could be on the cusp of a genuine technological revolution where systems like GPT-3 replace search engines or Wikipedia as our default resource for discovering information.

But if the large language models are ultimately just ‘‘stochastic parrots,’’ then A.G.I. retreats once again to the distant horizon — and we risk as a society directing too many resources, both monetary and intellectual, in pursuit of a false oracle.

On being placed into the “skeptics” box

Some skeptics argue that the software is capable only of blind mimicry — that it’s imitating the syntactic patterns of human language but is incapable of generating its own ideas or making complex decisions, a fundamental limitation that will keep the L.L.M. approach from ever maturing into anything resembling human intelligence.

How can we determine whether GPT-3 is actually generating its own ideas or merely paraphrasing the syntax of language it has scanned from the servers of Wikipedia, or Oberlin College, or The New York Review of Books?

Some people argue that higher-level understanding is emerging, thanks to the deep layers of the neural net. Others think the program by definition can’t get to true understanding simply by playing ‘‘guess the missing word’’ all day. But no one really knows.

The very premise that we are now having a serious debate over the best way to instill moral and civic values in our software should make it clear that we have crossed an important threshold.

We’ve never had to teach values to our machines before.

“So long as so-called A.I. systems are being built and deployed by the big tech companies without democratically governed regulation, they are going to primarily reflect the values of Silicon Valley,’’ Emily Bender argues, ‘‘and any attempt to ‘teach’ them otherwise can be nothing more than ethics washing.’’

Should we build an A.G.I. that loves the Proud Boys, the spam artists, the Russian troll farms, the QAnon fabulists? It’s easier to build an artificial brain that interprets all of humanity’s words as accurate ones, composed in good faith, expressed with honorable intentions. It’s harder to build one that knows when to ignore us.

And if large language models are in our future, then the most urgent questions become: How do we train them to be good citizens? How do we make them ‘‘benefit humanity as a whole’’ when humanity itself can’t agree on basic facts, much less core ethics and civic values?

Conclusion

A photo of a macaw, close up on its face in profile.
Source: https://www.maxpixel.net/Yellow-Red-Green-Hybrid-Macaw-Bird-Orange-Parrot-943228

The first few times I fed GPT-3 prompts of this ilk, I felt a genuine shiver run down my spine. It seemed almost impossible that a machine could generate text so lucid and responsive based entirely on the elemental training of next-word-prediction.

It’s important to stress that this is not a question about the software’s becoming self-aware or sentient. L.L.M.s are not conscious — there’s no internal ‘‘theater of the mind’’ where the software experiences thinking in the way sentient organisms like humans do. But when you read the algorithm creating original sentences on the role of metafiction, it’s hard not to feel that the machine is thinking in some meaningful way.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store