What is literate programming? How has literate programming evolved? Why hasn’t literate programming taken off? Where is it headed?
In the first article in this three part series, we addressed the first of these questions, learning about what literate programming is, something of its history, and the great promise it holds. Let us recall the definition of literate programming:
Literate programming harmonizes the human and machine aspects of programming into an integral whole which best expresses and communicates a software solution.
But something went horribly wrong. The Google Trends chart below shows starkly the inexorable decline in interest in literate programming.
From a high point in 2004 — and remember that at this point a lot of the air had already gone out of the literate programming balloon — interest has declined to almost zero, relatively speaking.
What went wrong? That is the topic of this second article in our series of literate programming.Let’s try to identify some of the causes of the virtual complete lack of momentum in recent years. Some of these reasons may be intrinsic to the activity of programming and programming behavior in any era; others may have appeared, or been aggravated, by recent changes in the environment.
Once we understand better what happened with literate programming, and the factors affecting its adoption (or lack thereof), we will be ready to propose a new generation of literate programming concepts and tools — and a new series of use cases — which will finally allow us to realize the potential of Knuth’s conceptual breakthrough. That will be the topic of the third and final article in this series.
Now, with no further ado, let us proceed to examine the factors that have slowed, and could continue to slow, the spread of literate programming.
It solves problems that don’t need to be solved anymore
To understand the genesis of literate programming, and the reasons it failed to catch on, let’s briefly review some historical background.
First, in 1984 there was no world-wide web or even really PCs for that matter; “documentation” was printed on paper. Knuth, of course, was a pioneer in computer-based typesetting with his TeX system, and so it is not surprising that his initial implementation of literate programming was designed to produce TeX documents. That meant, unfortunately, that users had to know TeX to some extent, and that the source documents were littered with incomprehensible TeX commands, so people had to write things like this:
Second, it turns out that at the time Knuth was programming in Pascal, an early modern language that could be said to be based on ALGOL (the progenitor of high-level structured programming languages), and was later made famous by the ubiquitous Turbo Pascal. However, Pascal imposed a number of limitations. For instance, it had no preprocessor or
#define like capability such as found in C; and it had restrictions on things like the order of declarations. Knuth’s literate programming initiative, which he called
WEB, can be viewed in part as being a kind of Pascal preprocessor to make up for these limitations.
Some critics of literate program say that modern programming environments no longer have these kinds of issues, which has made something like Knuth’s
WEB less necessary. That’s just one of the many criticisms of literate programming, which we go over in a following section.
Literate programming is too much work and/or not necessary
Contrary to Knuth’s assertion in the quote given in the first article in this series that literate programming lets him program faster, it’s a fair assessment that adopting a literate programming approach might actually require more time — in the worst case, perhaps as much as double. Of course, the notion is that this extra investment of time will repay itself many times over the coming years, by saving on maintenance costs, or serving as great teaching material for new folks, but unfortunately not only developers, but managers and business people as well, tend to be more interested in how fast they can get something working and shipped.
I don’t need no stinking comments
This problem is aggravated by a particular mentality found in some quarters of our industry that no kind of documentation, or comments, is really necessary in the first place, because after, real developers write “self-documenting code”, or in a variation of this line of thought, real developers can figure out any code; some people say that if you need to understand what code does you can just ask the person who wrote it (if they’re still with the company, of course).
In its extreme incarnation, this mentality actually claims that comments are evil and should be abolished. After all, why should I adorn a line saying
initialize(); with a comment saying
// Initialize variables? These “antidocs” people variously claim that comments, in addition to being superfluous, are subject to code rot — which of course they are, unless someone keeps them up to date; that “wrong documentation is worse than no documentation”; or that comments are actually a way of covering up for poor coding practices, and in that sense may even indirectly promote such poor coding practices, and on and on.
People who don’t believe in commenting their code are obviously not going to be the least bit interested in literate programming, which can be viewed as commenting on steroids. The prevalence of this attitude is definitely a factor working against restoring any level of interest in literate programming.
I personally believe that a lot of the blame for the demise of literate programming can be laid at the feet of the folks making IDEs. As programmers, we spend most of our time these days living inside IDEs. We’ve gotten spoiled by their auto-complete features, in-line error indications, and integrated build/run/debug features. We use them for everything from downloading libraries to sending commands to our phones to collaborative editing. Few if any programmers are going to leave the warm embrace of their IDE and fire up a text editor to enter cryptic commands in some literate programming language.
Yes, the IDEs do offer plug-ins. But in most cases, the plug-in architecture does not expose the kind of functionality one would need to implement literate programming in the IDE. If the IDE talks to some kind of language service in the background, which reports syntax errors, plugins may not be able to easily get their hands on that information in order to display error annotations in the literate programming version of the file.
Let’s be clear: there is no specific reason why IDEs could not offer literate programming features. They simply don’t feel it’s a priority. As a result, an entire generation of engineers has learned to program in an environment where they couldn’t use literate programming even if they wanted to. To be fair to the people making IDEs, there is a chicken-and-egg problem here; they don’t want to provide a feature no one is asking for, yet it is the very presence of the feature which could help restore the popularity of literate programming popular.
Perhaps one problem is that the people building IDEs are confused about what flavor of literate programming to implement. No one wants to go back to Knuth’s arcane original notation, which lives on in noweb. Certainly no one other than hold-outs in academia want to go anywhere near literate programming that involves LaTeX. But the IDE folks are smart and can figure this out. Markdown, or some variant thereof, is a perfectly serviceable, widely-used format for textual narrative, and in fact is already used in several of the few isolated experiments on literate programming being done by zealots to this day; for output, HTML is probably enough, but it’s not hard to also support other formats such as
.epub, in case someone wants to curl up in bed with your program.
VSCode, wherefore art thou?
By definition, literate programming requires an extra build step — the so-called “tangle” step of extracting and re-arranging the program code in the literate programming source into compilable or executable form. Although tangling usually involves nothing more than a quick command invocation, it is nevertheless yet another step to worry about and potentially go wrong — who wants today’s deployment to fail because of a lint error from some literate programming Markdown syntax?
In an era of source maps which allows us to map back compiled and transpiled code to the original source, or transformed CSS back to its pre-processor origins, there is a risk that we will be unable to map the code being executed back to the literate program we wrote — current approaches, such as they are, don’t address this issue at all.
Finally, it may be an issue that the tangling step also takes time; it should be quick, but every second counts.
Recent attempts to provide literate programming tools around the web stack don’t really address these tooling issues in a convincing way.
Literate revision management
Knuth invented literate programming in the previous century. Although literate programming has many advantages that have withstood the test of time, there have also been massive changes in the computing world, some of which might make literate programming less pertinent. One of those is the advent of version control.
Knuth did not have any version control at all that we know of, much less “commit messages” or “pull requests”. Commit messages — if written carefully — can provide a useful kind of documentation for a system, with the added advantage that they are ordered chronologically, so the most recent ones come up first, with the older ones aging away gracefully.
git and other version control systems can search commit messages, and we can find all the files involved in a particular commit whose message we have found.
This approach to consciously using commit messages for documentation has been called “literate version control”. In addition to commit messages, modern development methodologies, repository management systems, and issue and bug tracking systems will capture code review comments and bug discussions, all of which represent a kind of documentation, which has the advantage almost by definition of being mostly about the things we really care about. On the other hand, programmers who are not inclined to comment their code, much less program in literal programming style, are likely to also be the ones who write useless commit messages like “Fixed bug”.
Modern features in programming languages
As alluded to above, one of Knuth’s motivations in developing
WEB was the lack of macros in Pascal, and other language restrictions. Today’s languages generally do provide macro or meta-programming facilities. Chunks of code can be factored out into separate functions within the parent function (in many languages, at least — actually this was true for Pascal as well), and given meaningful names, with very low run-time overhead — some engines may even inline them for you, and some IDEs provide this kind of “refactoring” feature, so in this view there is no need to rely on literate programming to manage our “chunks” for us.
It’s safe to say that one key development in software over the last decades has been a huge focus on testing, testing frameworks, testing methodologies, and testing tools. We use the term “test-driven development” here in its broadest sense — development which takes tests seriously and devotes a meaningful amount of its time to developing and maintaining tests. Literate programming has never provided a convincing story about how it would co-exist with today’s test-centric priorities.
Speed of change
As software developers, the world around us is changing much more rapidly than ever before. Software can now be modified and deployed in hours, rather than months or years. Our modern culture and business environment has grown more dynamic, demanding that software change faster to keep up. We no longer have the luxury of sitting in our armchairs tweaking pixels and internal program structure. For literate programming, the question is whether things are now moving too fast to accommodate literate programming — or is literate programming actually a savvy way to deal with the increased rate of change?
It’s now 2018, nearly two decades after the Agile Manifesto. On the one hand, that manifesto says
We are uncovering better ways of developing software by doing it and helping others do it.
“Helping others do it” sure sounds like something that literate programming could help with. And the twelve principles include:
Continuous attention to technical excellence and good design enhances agility.
which definitely sounds like something that literate programming could contribute to, since literate programming could certainly be considered one aspect of “technical excellence”, and makes design explicit, communicates it clearly, and facilitates peer review. On the other hand, the manifesto goes on to say:
We have to come to value working software over comprehensive documentation.
Depending on how you look at it, that could be taken as a criticism of literate programming, to the extent you view it as “comprehensive documention” which slows down getting to “working software”.
So which is it? Is agile supportive, not supportive, or indifferent to literate programming? One insightful contributor to a Hacker News thread from 2015 put it well:
I don’t think literate programming is incompatible with the original agile manifesto, but I think it wouldn’t survive in what that seems to have turned into.
This article is not about justifying, or criticizing, or commenting on the evolution of agile or scrum. Suffice it to say that moving toward literate programming is almost certainly going to require a change in mindset for the manager used to locking down features which are going to get shipped next Tuesday or else, or getting evaluated on whether or not they are shipped.
The demise of program design
Literate programming is about expressing, and communicating program design, in addition to individual pieces of code. (As such, it is also a great way to share, review, and collaborate on designs.) By “program design”, we refer to internal design — the structure of the modules and components that make up the piece of solution, and their interfaces with each other and the outside world.
Unfortunately, the past decades have seen an inexorable decline in the practice of internal design. Far too many programmers these days believe that the first step in software is to sit down right away in front of the computer and start typing code, instead of first thinking, which is what program design is about. It’s also hard to do too much thinking or internal design when your manager is breathing down your neck about the release next Tuesday. It’s hard to get into an internal design mentality when none of your co-workers, or engineering managers who should know better, care about it or even know how to do it. A generation of programmers who have learned to program by copying and pasting fragments of jQuery is not suddenly going to adopt an internal design mentality.
Literate programming has the potential to stand in an interesting feedback loop with a revival in internal design. It can both benefit from a greater focus on internal design; and by its very nature, can contribute to that focus.
A modular world
In days of yore, monolithic programs were the rule, going all the way back to the 1000-card deck of punched cards that made up your FORTRAN program. Programming languages and development environments did not support breaking systems down into modules, or communiation among modules, at least not to the extend we know today. Literate programming as we know it has never really addressed the modularization of the world. How did individual literate programs relate to each other, or refer to each other? Literate programs like TeX appear to have been basically been one monstrous file, perhaps with simple inclusion mechanisms, which when “woven” turned into the TeX book.
A successful, modern incarnation of literate programming is going to have to fully support our current module-based world.
This article has been about the reasons for literate programming not taking off — some old reasons and some new reasons. However, I believe that none of these barriers are impossible to overcome, and in fact that we need literate programming today more than we ever had. In the next installment in this series, I will present my scenarios for the future of literate programming.