A Minimal Syntax For Quantum Text
Subcutanean, my upcoming novel that changes each time it’s printed, works like this: there’s a master text with the whole story, occasionally splitting into alternate versions and variants at the level of words, sentences, or even whole scenes. Each time the book is “rendered” for a new reader, a single option from each set of variants is randomly chosen, resulting in one particular version of the story. I wrote earlier about the aesthetics of why you’d do this, but the how is interesting too.
I call this kind of writing “quantum authoring,” because the author must hold all the possible versions in their head at once and keep each one interesting and consistent. Unfortunately, this kind of writing is often intertwined with programming or other mentally exhausting tasks, like operating a complex tool or remembering a finicky syntax. For this project, I wanted to write in a format that was as lightweight and unobtrusive as possible, so I could keep my brain entirely in “writing” mode while working on content. What I came up with was a minimal format called .quant, and I want to talk a bit about why I made it and what it’s good for.
The .quant format had a couple fairly simple requirements. First, I didn’t want it to distract from my creative process. Having written a lot of procedural text over the years (usually for games) there’s nothing worse than trying to be creative when you’re trying to remember syntax, mistyping special characters, or fighting with compiler errors.
By contrast, I wanted .quant writing to be as simple to type as possible. I also wanted it to be as close to idiot-proof as possible, for a very important reason: since the Subcutanean generator automatically exports a new print-ready PDF from the master file each time a copy of the book is ordered, which is then uploaded to a print-on-demand service, turned into a a physical book, and shipped off — all without manual human intervention — any errors would be much more galling. Normally the worst-case scenario for some broken procedural text is that a player sees a mistake in a momentary message. Here, that error would be preserved forever in the pages of the book: or in the worst case, cause its entire text to be corrupted, resulting in an unhappy buyer and an expensive replacement.
These two requirements — a minimalist syntax and trying to reduce as much as possible the chance of error — led to some early initial constraints. First, I decided there would be no routines or GOTOs in the format. This was an easier decision than it would have been for a game, because Subcutanean has no branching or interaction: the plot proceeds chapter to chapter in the same way for each reader, varying not in the overall structure but in the way individual scenes play out, in which particular details are revealed or omitted. I also didn’t need any reusable pieces of text that might have to show up in different contexts or situations, as one would generally need for an interactive work, so that also made this simplification more possible.
Second, I realized I needed two major kinds of variation: simple alternatives that didn’t need to be reasoned over or remembered, and choices that would impact text in multiple places. The latter implies variables, which can be set and later checked, so I needed to account for that.
I considered a number of existing solutions for procedural text authoring, but in part because my use case is so particular, none of them quite met my needs. Languages like Ink designed for games with explicit choice points weren’t really appropriate, as these are essentially centered entirely around GOTOs as a paradigm.
Tracery is a popular language for procedural text, but is optimized for writing long chains of nested expansions. This means most of what you’re doing is defining keys meant to be expanded elsewhere, which was more heavyweight than I needed: a major desiderata was being able to read through the dynamic text along with the static text, composing and editing both together within the same flow.
I’m also very familiar with Inform 7’s way of handling variant texts, and thought about basing my compiler around its syntax. But it’s also a bit heavyweight for my use case, in part because of its natural language paradigm, and in part because of the more powerful control it offers over different things to do with textual variants.
Other tools were also unsuitable for various reasons, such as requiring IDEs rather than support for text files. I did in fact find things quite similar to what I was looking for, such as the Javascript library Bracery (which starts off with a very similar syntax to the one I ended up using, before getting more complex). But ultimately I decided to roll my own Python tool that would work with the rest of the tech stack I needed to make this project happen.
In the .quant format as it ended up, the most common use case is the simple inline variant. These are indicated like this:
Square brackets were chosen (as were all control symbols) for the unlikelihood that they’ll appear in regular prose. They save a shift keystroke compared to curly braces. Pipes are better than slashes (which do sometimes appear in prose) and in most fonts stand out a bit above and below the line, making them more visually obvious. Note that I also made a syntax highlighter for Sublime Text, seen in these screenshots, as the first line of defense against obvious syntax errors like forgetting a bracket.
A single bracketed text will either be printed or not, at random: this is the same as [text|] but slightly more elegant. For instance, the below might result in “…I almost forgave him” or “…I forgave him.”
(Technically, written this way the null option above would have two spaces between I and forgave. Because I knew my output was LaTeX code which ignores extra whitespace, I knew I could likewise ignore this issue: the same would also have been true if my output was HTML. In other contexts (like Inform 7, for instance) one generally needs to spend more time getting the exact position of the brackets right because spacing is preserved in the output. I did still have to worry about punctuation joins and so on — in a later post I’ll talk about a separate tool I built to help catch those errors.)
Sometimes you want certain alternatives to appear more or less often. I thought about whether I needed this for Subcutanean — it felt conceptually purer in some ways to keep the selection entirely random — but I decided I did want to allow for the possibility of some texts that were rare or even very rare, appearing only in one or two books out of a hundred. It would be easy to get bogged down with possibilities here: Inform, for instance, supports various kinds of randomness like “with decreasingly likely outcomes” that makes each option less likely be selected than the one before it. But I decided I wanted to deploy this in specific places with tight control over distributions, so ultimately I just went with a simple method of directly specifying numeric probabilities for each option, in the situations where I needed to do so.
The numbers must always sum to 100, except that the final number can be omitted to assume the remaining distribution space: in this example leaving off the 5> would still have the effect of a 5% chance of choosing “C-Dog”. The parser can then complain if the numbers don’t add up. This exactness gets annoying for very long lists of variants but I didn’t anticipate needing very many of those. In fact the current draft has only one instance of a very long list that specifies probabilities.
The last major part of the syntax was to control random selections that would affect multiple pieces of text: setting variables.
These are explicitly defined so the compiler can catch typos or mismatches when they’re used, and they’re not case sensitive because case sensitivity is dumb. For my particular use case I didn’t need anything more complex than booleans or enums, which the two examples above demonstrate: for each rendering, BreakupSubplot will be randomly true or false, and either verbose or taciturn will be true; if there were more options here, only one of them would be true on any given run. (You can also assign probabilities to variable assignments: [DEFINE 25>verbose].)
Text can then be gated based on whether a variable is true by starting with the variable reference:
To reduce the possibility of a word to be printed getting confused for a variable reference (by either the parser or the writer), a variable is allowed to appear in exactly two places: immediately after a DEFINE, or immediately before a > . The distinct separator character provides precise control over leading spacing (compared to a potentially ambiguous syntax, something like [@verbose rather ebullient…] and makes it easier to catch any spurious uses: > it turn must be preceded by a number or a recognized variable.
I thought long and hard about whether I should add conditional logic to this syntax, for instance to have text only printed when both BreakupSubplot and verbose are true. I finally decided not to allow this, in part because of the much greater likelihood of introducing authoring errors this way, but mostly out of a sense of aesthetic purity. After many of my past projects with exceedingly complicated procedural text, I thought it would be a nice exercise to keep all the randomness at a single hierarchical level. No spending time writing complex nested prose that might only be seen by a tiny percentage of readers; no compounded branches multiplying the amount of possibility states I needed to cover for a single sentence. If I wrote five variants (hand-selected usages of probability aside), there’d be a one in five chance that any of those pieces of text would be seen. Simple and clean.
I felt very smug about this until I immediately hit on a situation that required compound decisions after all, and had to go back to the drawing board. More about that in my next post.
Get your own unique copy of Subcutanean, or subscribe to my project mailing list for infrequent announcements of my new and upcoming projects.