Generative Composition: A Primer

So you want to make a Twitter bot

nick barr

Published in

Algorithms and Authorship

5 min readJul 22, 2014

Introduction

I recently encountered Crummy, a distinguished website run by Leonard Richardson. There’s a lot to dive into, but something that immediately stood out to me was Richardson’s presentation on “Writing Aliens,” which lays out 3 techniques (aliens, in Richardson’s words) for programmatically generating text.

Richardson’s roundup is probably the best I’ve seen on the subject, so I thought I’d repost it here in a more concentrated format.

This post is for anyone interested in natural language generation, whether it’s for a poem or a Twitter bot or a readymade email explaining why you’re not coming in to the office. It is conceptual, not technical; I recommend programmers check out the links at the bottom of this post.

Getting Started

Pretty much any text generator relies on training data, a corpus of text from which the generator establishes first principles.

For our purposes, let’s go with a really simple source text: 3 separate rejection letters from 3 poetry publishers (kudos to the wonderful Rejection Wiki). Keep in mind that the typical source text for serious projects is about 100,000,000 words.

We’re not going to be able to keep anything from this submission, we’re sorry to say. Thank you, though, for letting us have a chance with your work. Yours, The Editors at Poetry Magazine.
We’ll have to pass on this submission, sorry to say — but we enjoyed reading it. Thank you for letting us have a chance with your work. Warmly, The Editors at American Poetry Review.
Thank you for giving us the opportunity to read your manuscript. After much consideration, we feel unable to use it for publication. We regret that the large volume of submissions precludes a more personal reply. The Editors at Paris Review.

Our goal is to use this source text to programmatically generate a rejection letter of our own.

Techniques

Now that we have our source text, there are a number of techniques that we can employ to do interesting things with it. Richardson breaks them into 3 categories: “Duchamp,” named for its Dadaist output, “Markov,” named for its use of probabilistic Markov chains, and “Queneau,” named for that writer’s experiments in recombination. We’ll explore each of these techniques.

The Duchamp Technique

The Duchamp technique is essentially equivalent to Burroughs’ cut-up method. We take the source text, carve it into slices, and re-assemble those slices arbitrarily.

In this method, as in all methods, it’s left to the writer to figure out what the “order,” or size of those slices, is. Do we want to slice up the source text by sentence? By word? Physically? Into thirds? The results vary accordingly.

Let’s say we want a 3-sentence rejection letter, slicing up the source text by word. Here’s what we might come up with:

We sorry pass be with have use manuscript. To use sorry consideration, yours, the us editors. American submission, warmly, for to volume this this to much use we sorry on review we’re keep us use anything you magazine: we’ll be review we’re the after have pass for editors.

The output is absurd, consistent with the Dada movement that Duchamp helped establish.

The Markov Technique

Markov chains use probability to predict the next word based on the current word. If we look back at our source text, we will find that these words follow the word “to” with the following frequency distribution:

wordsThatFollow(“to”) = [“say,2”; “be, 1”; “keep,1”; “pass,1”; “read,1”; “use,1”]

So if we generated text with the Markov technique and started with the word “to,” “say” would be the slight favorite to be the next word. Then we’d run the same frequency distribution on “say” and go from there. Here’s a 3-sentence rejection letter a la Markov:

After much consideration, we enjoyed reading it. Thank you for publication. We enjoyed reading it.

The advantage of the Markov technique is that it makes some sense, and makes even more sense when you change the order from a unigram (a single word) to a bigram (a word pair).

The disadvantage of the Markov technique is seen in the rejection letter above. It easily slips into rehashing the source text, and risks repeating itself. Markov chains work better with very large corpora, and even then usually require various smoothing techniques to handle improbable outcomes. Note that probability weighting is not a strict requirement of a Markov technique; the algorithm behind my app Today: A Text Adventure in Space is based on a Markov chain that treats all possible outcomes equally.

The Queneau Technique

The third method for generating text is the Queneau technique, named after the founder of the Oulipo circle. Queneau wrote a set of 10 sonnets called Cent mille milliards de poèmes, in which each sonnet has the same rhyme structure. As a result, there are 10^14 poems that can be generated by recombining the lines — there’s a nice web implementation illustrating this here.

The Queneau technique requires defining an underlying structure or skeleton for the output, and then finding valid inputs. In the case of our rejection letter, we could simply say that our skeleton is 3 sentences, and so we’d take the 1st sentence from one rejection letter, the 2nd sentence from another, and the 3rd from another:

We’ll have to pass on this submission, sorry to say — but we enjoyed reading it. Thank you, though, for letting us have a chance with your work. The Editors at Paris Review.

Here the output is almost indistinguishable from human writing, mainly because the source text is so conducive to the Queneau technique. And, as always, we could have opted for a more granular order to make things more chaotic; for example, taking the 1st word of one rejection letter, the 2nd word of another, and so on.

Implementation

Ultimately, these techniques are not mutually exclusive and the most interesting text generators likely employ some combination of them. And they are not totally exhaustive, although they come pretty close: the Queneau technique is concerned with structure, the Markov technique is concerned with sequence, and the Duchamp technique is concerned with neither. Think about the size of your training data and your goals for the project when choosing a technique, and if all else fails go with all 3 and use the I Ching to toggle between them ☺