Whither Literate Programming (1)

Bob Myers
Bob Myers
Aug 31, 2018 · 6 min read

It’s been more than three decades since literate programming was invented by the legendary Donald Knuth. Its promise was immense — creating software which was more reliable, more understandable, more maintainable, and with hugely more teaching value — software which targeted both humans and computers.

What is literate programming? How has literate programming evolved? Why hasn’t literate programming taken off? Where is it headed?

In this three-part series, we’ll take a detailed look at these questions. This first part is an introduction to what is literate programming. (The second part, an overview of the challenges with literate programming, is here.)

What is literate programming?

Let’s define literate programming.

Literate programming refers to melding a descriptive narrative and computer code into a single document, from which both human-friendly documentation and computer-readable files can be created.

Let us quote from Knuth’s paper on literate programming:

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reınforce each other.

As Knuth says, the document does not have to, and in fact probably should not, follow the lexical order of the source code, whatever that is; instead, the whole idea is that its order should be most meaningful to human readers (he calls this the “stream of consciousness” order). The process of creating the document is called weaving; the process of creating the compilable files is called tangling.

In a 2008 interview, Knuth also claims increased productivity as well as more happiness:

Not only has it enabled me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since the 1980s — it has actually been indispensable at times.

Recent LP works

In addition to various works published by Knuth in what he called WEB, his initial implementation of literate programming, there are several outstanding examples of literate programming. Perhaps the most ambitious is the CGI book called “Physically Based Rendering” by Pharr and Humphreys, which is famous for being the only book ever to have won an Academy Award (Knuth said it should have also won a Pulitzer):

Example of literate programming from “Physically Based Rendering”

Other books cited as great examples of literate programming style include “C Interfaces and Implementations” by David R. Hanson:

Literate programming example from “List in Small Pieces”

The book “Lisp in Small Pieces” is sometimes cited as an example of literate programming, but it seems to really be narrative interspersed with code fragments for didactic purposes.

One post in a Hacker News thread from three years ago described the thorough-going use of literate programming at a major defense contractor, with excellent results. He notes:

It forced us to keep things clean, modular and simple. It meant doing everything took longer the first time, but at the point of actually writing the code, the coder had a really good picture of exactly what it had to do and exactly where it fitted in to the grander scheme. There was little revisiting or rewriting, and usually the first version written was the last version written. It also made debugging a lot easier.

What is NOT literate programming?

In this section, we review various alternatives and point out why they are not literate programming in the strict sense of the word.

Docco

Systems such as docco already do something superficially similar to literate programming. docco is a way to present nicely formatted output of comments, alongside or interspersed with source code. It not only treats the comments as Markdown, meaning you can apply styles, but also applies automatic syntax highlighting to your code. Underscore’s annotated source code is a classic, well-executed example of using docco, which supports virtually any language. There are dozens of variations of this product. It’s a great piece of work, but it’s not literate programming, most basically because it’s missing the key feature of re-ordering and transclusion.

Jupyter

What about systems like Jupyter, which provides a great notebook format mixing narrative and code, with the added bonus that code can be executed right in the “notebook”, with the output, including visualizations, visible in real time? Although its roots are in the Python world, Jupyter now supports other data-focused languages such as R and Julia. But as wonderfully useful as Jupyter and similar systems, such as ObservableHQ (coming from the d3 world) are, and they are becoming wildly popular as a way for data scientists to do and present their work, we should distinguish them from literate programming. They may contain little programs, but they are not intended to be a programming system. The Wolfram Language (the language of Mathematica) also appears to fall into this “live notebook” category.

JSDoc

JSDoc is a documentation generation system designed to produce self-standing documentations of interfaces and APIs. Virtually all other languages have similar systems, including of course its ancestor Javadoc, JSDoc is great. There are other similar systems to do things like auto-generating REST API documentation. But we can’t call these literate programming. They are automated documentation generate tools.

Literate CoffeeScript and Literate Haskell

CoffeeScript offers a literate programming mode, in which narrative material is mixed with code (identified by being indented) in files with the .literatecoffee extension. However, purists would say this is nothing more than a syntactic alternative for comments, and that it fails to meet the basic literate programming criteria of allowing the author to re-order the program to present it in more human-friendly fashion.

Haskell offers a well-regarded literate programming mode. A tool called Haddock generates documentation from annotated source code. It offers two solutions to the classic problem of how to distinguish narrative from code when doing literate programming: “Bird notation”, where each line of code is prefixed with a “>”, and LaTeX notation, where code is surrounded by ` , \begin{code} and \end{code}pairs (latex style); other formatting unfortunately equires knowledge of LaTeX as well.

However, purists would again object to calling this feature, as useful as it is, “literate programming”. They would point to the lack of any ability to include fragments of code in other fragments, or the full-fledged ability to present the code in a different order than it is to be presented to the compiler.

For R aficionados, there is the sweave package — also LaTeX dependent — and its successors, including knitr. And there are many others.

Currently, the only current tools that really qualify as literate program in the strict Knuthian sense of the word are all quite long in the tooth — for instance, noweb, a simplified 1990s port of CWEB (a C language version of Knuth’s original WEB system).

In the second part of this series, we will take up the question of why literate programming has gotten so little traction over the decades. In the third part we will move on to discussing future directions for literate programming.

Bob Myers

Written by

Bob Myers

Technologist/author/translator mainly writing about computing

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade