Unsettling Something: Computer Generated Poetry

Why do we mistake computer generated poems as the work of humans?

Kirsten Menger-Anderson
Counter Arts


“Computer generated poem” image generated by DALL-E mini

From Wordsworth’s “spontaneous overflow of powerful feelings” to Eavan Boland’s “figure in which secret things confide,” poetry is often defined by — and extolled for — its ability to convey human emotion. What, then, does it mean that we can’t distinguish poems penned by humans from those generated by machine?

Indeed, researchers Nils Köbis and Luca D.Mossink at the University of Amsterdam have found that humans cannot tell AI-generated poems from those written by amateur poets, or by well known professionals, provided a human selects the best poem from a set of machine-generated verses to compare. Have machines become as talented as our poets? Or are other forces at play?

A Brief History of Computer Generated Poetry

The idea of computers writing poetry is not new. Oulipo, a loose-knit group of writers and mathematicians, used computers to generate sonnets as early as 1961. In the late sixties, Margaret Masterman and Robin McKinnon Wood wrote a program to compose Haiku. In the 80s, the ALAMO group created Rimbaudelaires, a program that replaced words in Rimbaud’s sonnet with ones from Charles Baudelaire’s lexicon, while Racter authored The Policeman’s Beard is Half Constructed, billed as “The First Book Ever Written by a Computer” and containing poems with lines such as:

Blue potatoes are ungainly things
As are red and purple lamb chops

Other mechanical poets include McPoet, Brekdown, Hafez, PoeTryMe, and PoetryCreator, a version of which remains online (I tried it and generated nine lines terminating with “Spare me your cool bauble or I shall chew.”)

A brief history of computer generated poetry. Image of illegible text generated by DALL-E mini https://huggingface.co/spaces/flax-community/dalle-mini
“A brief history of computer generated poetry” computer generated image by DALL-E Mini

The computer generated poems in Köbis and Mossink’s study were penned by OpenAI’s GPT-2, which — just over two years ago — inspired The Guardian headline, New AI fake text generator may be too dangerous to release, say creators. The professional poet challengers include Maya Angelou and Hermann Hesse, whose opening lines were also used to inspire the computer’s verse (you can see the poems and try the test yourself, here).

A similar mash-up of human/machine generative work is found in Kane Hsieh’s Transformer Poetry, a book of GPT-2 generated poems that includes poems initiated with the first stanzas of Elizabeth Bishop’s “One Art” and Maya Angelou’s “Still I Rise!”, among others. “This book is not meant to have any scientific or literary value,” notes the introduction. However, Hsieh may not have considered the possibility of using such poetry in experiments to see if humans can tell who wrote what.

Applying a Turing Test

Applying a Turing test — in which a human judge determines which of two subjects is a person and which a machine (and ultimately, if a machine is intelligent) — to poetry is also not new. Raymond Kurzweil describes such an experiment in his 1990 book, The Age of Intelligent Machines. Using a set of poems containing work by humans and by his computer program the Kurzweil Cybernetic Poet, he found that people correctly determined the correct nature of the author more often than not.

Six years later, Lawrence Andrew Koch conducted a similar test as part of his Masters’ thesis. His subjects, all literature and creative writing students and professors at the University of Montana, tried to determine which poems were composed by computer (using one of seven different poetry generators) and which by a human poet. His judges answered correctly less than half the time, and T.S. Eliot was mistaken for a computer by almost one-third of respondents:

Red river, red river,
Slow flow heat is silence
No will is still as a river
Still, Will heat move
Only through the mocking-bird
Heard once? Still hills

Koch also observed a disconnect between our attitude towards computer poets and our inability to recognize their work:

I was approached by many aspiring poets who stated in no uncertain terms that “a computer cannot write poetry.” Interestingly, the same persons who expressed such strong opinions could not determine decisively whether a human or computer generated the stanzas in my survey

In 2013, Oscar Schwartz and his friend Benjamin Laird created an online version of the test, which was taken by thousands of people. Some of the computer generated poems were mistaken for human work the majority of the time, Schwartz reports. And Gertrude Stein’s poem “Red Faces”, like T.S. Eliot’s work, was easily mistaken as the work of a machine:

Red flags the reason for pretty flags.
And ribbons.
Ribbons of flags
And wearing material
Reason for wearing material.

More recently, Deep-speare generated Shakespearian sonnets that people with ‘no expertise in poetry’ mistook for the real deal (though an assistant professor of literature was not fooled). And between 2016 and 2018, Dartmouth University’s “Turing Tests in the Creative Arts” invited ‘any and all comers’ to test their generated creations against humans.

Köbis and Mossink included a new twist to their Turing tests by offering a financial incentive for accuracy. However, even knowing that correct responses would be rewarded, participants failed to reliably detect computer generated poems.

Have Computers Mastered Verse?

It is worth noting that Turing himself reflected on machine’s eventual mastery of verse:

I do not see why it (the machine) should not enter any one of the fields normally covered by the human intellect, and eventually compete on equal terms. I do not think you even draw the line about sonnets, though the comparison is perhaps a little bit unfair because a sonnet written by a machine will be better appreciated by another machine.

Turing’s idea of a computer appreciating computer generated poetry has a certain charm. The poet Howard Nemerov noted it as well in his 1967 essay “Speculative Equations: Poems, Poets, Computers”:

This is not to say that computers, even present ones ‘can’t write poetry.’ For nine-tenths of the poetry in the world, past and contemporary, is of a dullness and mechanic servility so appalling that if it has to be written at all it certainly ought to be written by computers, although only on condition that other computers be instructed to read it.

“Mastering Verse” Computer generated image created by DALL-E Mini. Abstract, illegible script
‘Mastering Verse’. Computer generated image created with DALL-E mini

Throughout his piece, Nemerov crisply observes that the bulk of generated text belongs in the trash bin, but he also admits that it can be moving. “I have still to ask myself what has happened,” he writes, “what is the nature of the transaction that has taken place among myself, the computer and the language.”

Similarly, one might ask how a room full of students could find meaning in a list of author names compiled as a reading assignment, but Stanley Fish documents this phenomena in his essay “How to Recognize a Poem When You See One”. The name ‘Jacobs’ was ‘explicated as a reference to Jacob’s ladder;’ The name ‘Thorn’ was interpreted as ‘an allusion to the crown of thorns’.

“Interpreters do not decode poems,” Fish writes, “they make them.”

Over the last several years, computers have begun to be recognized for their reading skills, outscoring humans on reading comprehension tests, for example, though only for specific tasks. The computer does not read like we do. It can find an answer to a question, but it does not interpret poetry (or homework assignments), or feel moved in the way Nemerov describes.

People can be moved, whether a text’s author is a human or machine, and we might even be delighted by something as seemingly dull as a list. Perhaps what the Turing test reveals — when it comes to poetry at least — is not that computers have human creativity or that their work draws from a deep emotional well, but that humans aren’t very good at judging this. And, maybe, the idea of a computer striving to generate text that passes as human-authored is itself problematic — fears about GPT-2’s misuse range from fake reviews, to fake news, to fraudulent academic essays. That GPT-2 could be used maliciously was the reason its creators did not publicly release their full research in early 2019, after all.

Should We be Concerned?

Concerns about GPT-2 generated poetry are nuanced. Last year, Dan Rockmore, one of the Dartmouth Creative Turing test organizers, reflected on what he calls the “boringness” of creative Turing tests, prominently quoting computational poet Allison Parrish, who put it bluntly: “I think that imitation is the most boring thing you can do with a computer.”

Long before the possibility of computer generated poetry was realized, Ada Lovelace herself noted machines’ lack of originality: “the Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform.”

Recently, researchers looked closely at GPT-2 generated text and discovered that passages can contain ‘verbatim text sequences’ from the source texts used to train the model. In addition to privacy issues (generated text contained the names of real people and their contact information) the model had an uncanny ability to memorize.

Meanwhile, the technology continues to evolve. GPT-3 — the next generation of the model used in Köbis and Mossink’s experiments — arrived in 2020, far larger and trained on a corpus of 200 billion words. When fed the beginning of chapter 3 of Harry Potter and the Philosopher’s Stone, the GPT-3 model went on to generate the first 240 words of the text, exactly as Rowling penned them.

Unperturbed, the GPT-3 natters on. Along with generated news stories, which evaluators had trouble distinguishing from human authored ones, the model composed a poem in the style of Wallace Stevens. “Generate Poem 1” ends with the lines:

The yellow of the sun is no more
Intrusive than the bluish snow
That falls on all of us. I must have
Grey thoughts and blue thoughts walk with me
If I am to go away at all.

Poet Andrew Brown, who noted that the GPT-3’s work was ‘worth editing’ more often than not, asked the model to write a poem “from the point of view of a cloud looking down on two warring cities.” The generated text opens with the line, “I think I’ll start to rain.”

The model also generates racist and sexist language and has been known to tell suicidal patients to kill themselves. Yejin Choi, a computer scientist at the University of Washington and the Allen Institute for Artificial Intelligence expressed concerns:

“What we have today…is essentially a mouth without a brain.”

Our ‘Wonderful Machines’

To be fair, our fascination for composing texts without skill or thought predates the rise of computer models. Jonathan Swift’s Gulliver’s Travels, published in 1726, details a ‘wonderful machine’ that permits “the most ignorant person, at a reasonable charge, and with a little bodily labour” to “write books in philosophy, poetry, politics, laws, mathematics, and theology.” Tristan Tzara’s “How to Make a Dadaist Poem” (1920) advises the poet to find a newspaper, ‘carefully cut out each of the words’, and put them in a bag. “Shake gently,” Tzara writes. “Next take out each cutting one after the other. Copy conscientiously in the order in which they left the bag.”

Computer generated image of a ‘wonderful machine’. The machine looks like a melting type writer. The image was generated with https://huggingface.co/spaces/flax-community/dalle-mini
A ‘wonderful machine’. Computer generated image created with DALL-E mini

History has not always been receptive to these notions. The invention Jonathan Swift describes is a work of satire. Tzaras’ conclusion, “And there you are — an infinitely original author of charming sensibility, even though unappreciated by the vulgar herd” speaks of a society that shuns such efforts.

Today, we are enchanted by our mindless mechanical writers. Computers can imitate us well enough that we wonder: Who wrote this poem? Human? Machine?

Toward the end of their study, Köbis and Mossink’s write, “we emphasize that the results do not indicate machines are ‘creative.’” They note that “one of the main functions of creativity in general and in poetry in particular is the expression of (deep) emotions, a feat that machines lack (so far).”

So, why do we mistake a machine’s poetry as human authored? It’s possible that we see a rhyme or compelling image and look no further. It’s possible that our reading is merely superficial. It’s also possible, as Fish illustrated, that we might be too good at finding meaning, however meaningless a text might be.

“You want a poem to unsettle something,” the poet Tracey K. Smith has said. “There’s a deep and interesting kind of troubling that poems do, which is to say: ‘This is what you think you’re certain of, and I’m going to show you how that’s not enough. There’s something more that might be even more rewarding if you’re willing to let go of what you already know.’ ”

Perhaps letting go is a quality that — like creativity — belongs only to humans. Perhaps, in the end, letting go is how we become both rewarded and vulnerable, and why we can be moved by the words of strangers, whether flesh or silicon.