Ben Fry
11 min readOct 16, 2016

I was asked to write an essay about this project for an upcoming book. Below is an adaptation of it for Medium, and I look forward to sharing the book when it’s released.

About a decade ago, a friend was telling me that during her doctoral research in genetics, she went back and read Charles Darwin’s On the Origin of Species. She noted that the book had changed significantly over the course of many editions that were released over the years. To her, it showed Darwin still toiling with his theory of evolution — still unsure of its exact contours as he tweaked and revised its specifics.

The ongoing changes were especially striking because Darwin had initially rushed to publish his theory for the first time, after receiving a manuscript from Alfred Russel Wallace that contained remarkably similar ideas. Wallace did not have the same social or scientific stature as Darwin, and the result was that both their papers were first read publicly at the same meeting of the Linnaean Society. Had the more eminent Darwin taken liberties with Wallace’s ideas — not fully understanding them at first, but continuing to revise his own publication in the years that followed as he absorbed the mechanisms of natural selection? This idea appealed to me in a truth-seeking, if mildly paranoid way, however unlikely it might be that 150 years of academic scholarship wouldn’t have uncovered such scientific malfeasance.

But potential conspiracies aside, the more compelling fact remained that in these multiple editions of The Origin of Species, we had the opportunity to understand how it changed over time. The first edition weighed in around 150,000 words, and by the sixth, the manuscript had grown to over 190,000. All six editions totaled approximately a million words; what does that kind of output look like?

With that question in mind, I began writing software to assemble the changes and display them, in order to start understanding the overall structure of the differences and find an indication of where to go next. This is the approach I use when doing information design with code: create a software “sketch” that provides a first picture, respond to that, create another image, and so on. The first sketch usually starts with something obvious — in this case, just depicting all of the words at once. But with that in hand, the data starts to reveal itself, and you can further iterate and refine with more software sketches based on what appears in each subsequent iteration.

Happily for me, the six editions had been recorded digitally by Dr. John van Wyhe and his colleagues at Darwin Online. These HTML transcripts were a more usable starting point than other publicly available sources which had errors, weren’t in a consistent format, or lacked all six editions.

“Specious,” a software sketch that depicts all six editions of Darwin’s text in a single application (2008)

In this initial sketch, each edition is a shown as a column of text that extends down the screen. On the left-hand side, all six editions are displayed in miniature. The viewer can drag their mouse across these tiny versions to jump to any location in the book, and each of the six columns of text will synchronize to that exact same location across each of the books.

Each column also shows the changes between that edition and the previous. In the style of the Track Changes feature in a word processing program, text that is removed in one edition is shown in red and crossed out, while additions are shown in blue.

Detail of the six editions as shown in the original software

Along with providing access to the entire million words of text at once, this view shows several patterns beginning to emerge. In the tiny columns at the left, we can see the book growing longer as it adds a hefty 40,000 words over the six editions. The fifth column reveals large portions of blue and red. This is due to the lapse in time between the fourth and fifth edition, during which Darwin was working on translations and therefore revising the work in other languages; he then returned to his native text to incorporate the many interim edits.

The sixth edition shows deletions from all over the book, and an enormous addition in the center — a new chapter in which Darwin engages with criticisms of his theory. In previous editions, he had given brief responses to his critics, but in the sixth edition, he excised these responses from other parts of the book and placed them in a chapter of their own. Perhaps Darwin wanted to address the criticisms more directly: maybe recognizing his own mortality, or more simply, that this might be his final edition. Previously, he had relied upon other colleagues to engage in critique and debate about his theory; now it was time for him to be an active participant in the discussion.

However, this representation of the text, while answering the initial question about what a million words might look like, and revealing some of the structure of the edits themselves, still wasn’t enough to convey the story.

The idea of ongoing refinement of a theory isn’t necessarily surprising in itself — any scientist, or even anyone who’s spent time with scientists, will observe that each small victory of incremental progress in one’s field simply opens wide an even larger chasm of unknowns.

For every answer, every bit of knowledge gained, a dozen new questions present themselves. Science is not something to be ultimately “known” or otherwise “figured out.” Instead, just like the universe it studies, science is in a constant state of expansion.

Even the six editions of the book considered here only represent the English-language editions released during Darwin’s lifetime. Since Darwin sometimes participated in translation projects, editions in other languages weren’t so much direct translations as they were interim steps between the updates released by his British publisher.

A reprint of Peckham’s variorum text from 1959

Fifty years ago, coinciding with the centennial of the release of Darwin’s manuscript, author Morse Peckham collected all six editions into a single “variorum” text. Peckham painstakingly created a reference system that denotes the modifications and changes between editions. The text was created by Peckham’s careful enumeration of every sentence from every edition, copied onto index cards; from these cards, he carefully assembled them into a final text. In the preface, Peckham describes his goal:

“It appeared to me unquestionable that it was impossible to write the early history of the development of Darwinian and evolutionary thought as affected by the Origin unless the student had a variorum text. And so, since biologists and even historians of science have more important things to do, and because the task seemed to call for the efforts of just such a harmless drudge as a student of literature, I undertook to create a variorum text.”

Referencing a specific example of a change in the text itself, Peckham writes, “Without a variorum text, such a fact — and there are dozens of equal or greater importance — cannot even be known.”

Peckham’s book is a remarkably hefty 816 pages, and at almost two inches thick, doesn’t inspire a casual read. But what an incredible achievement to have it in one place. It’s a testament to his patience, and also reminds me of my own lack of patience: it’s one of the things I love about writing software. I appreciate the ability to test design ideas in a medium that allows me to quickly understand even massive sets of data like this one. What would take me a few hours or days, maybe even a couple weeks to perfect, no doubt stretched into months and years for Peckham.

In this less-than-wieldy printed form, the fascinating story of the edits and evolution of Darwin’s thoughts on evolution is lost to the world of academics and historians of science. One Darwin scholar even suggested that it was silly to pursue my own project, claiming that it was only of interest to a narrow group of about five hundred people who attend a specific conference each year. When I told them that the point was to bring Darwin’s edits to a larger audience, and that that was the nature of information design, this scholar even grew a little suspicious! But I suppose they can’t be blamed: it was 2008 and everyone seemed to want a piece of Darwin, what with his 200th birthday coming up as well as the 150th anniversary of the publication of Origin.

And what are the nature of the changes themselves? My initial representation showed them to be split between a few main categories, a more precise detailing of which is given by scholar Barbara Bordalejo, introducing her creation of a contemporary online variorum. She categorizes the changes as:

  • Depersonalization — dozens of “I think,” “I presume,” and similar constructions were removed, resulting in a more objective tone
  • Reinforcement — removing hesitation and being more precise or forceful with statements
  • Objectivization — excising colloquial phrasing like “often” and “just”
  • Clarification — making phrases clearer or otherwise improving sentence structure
  • Updating — incorporation of emerging theories and ideas, or simple changes to date references
  • Semantic changes — variations of an argument, whether using a different example or undertaking more involved reconstructions

One of the most remarkable changes Bordalejo cites refers to the impact the book was having in its own time. The 1861 text states, “The great majority of naturalists believe that species are immutable productions, and have been separately created.” Just eight years later, the fifth edition claims, “Until recently the great majority of naturalists believed that species were immutable productions, and had been separately created.” The addition of “until recently” represents a kind of self-awareness usually only seen in epilogues and retrospectives many years after an author’s passing.

A few months after building the initial version of the piece, I found myself in Cambridge (UK) for a workshop organized by Greg McInerny at Microsoft Research. Over drinks the first evening, Greg asked what I’d been working on recently; I said that one of my latest projects was visualizing how Darwin’s Origin of Species changed over time. His eyes grew wide, he paused, and he said something like “you’re kidding” (though it may have been a little more colorful than that). As it turned out, he was working on the same theme with Stefanie Posavec. They completed their piece some time later, depicting the changes as lovely branching trees — a kind of homage to Darwin’s lone diagram in the book. And in the months that followed, Greg took to signing his emails “Wallace.”

The only diagram or image in The Origin of Species, a tree depicting divergence (source)

The Microsoft conference was in April. I’d been asked to provide a piece for an exhibition that August, which made for a good excuse to revisit the exploratory tool I’d already built. To make it suitable for an installation, the artwork would need to be self-descriptive enough to make the story of the edits quickly apparent, while also beautiful enough to belong in a gallery setting: both striking from a distance and engaging close-up. The original tool wasn’t about the story so much as a survey of what the data — that million words — contained, so it was necessary to simplify and do a better job of drawing the viewer into the piece.

The resulting visualization begins with the entire text of the first edition on screen, in gray. Users can move the mouse around the miniaturized paragraphs to reveal a magnified version of individual sentences. Over time, each successive edition is added to the screen. As a portion of the first edition is replaced with a sentence from the second edition, the new text is colored red. Next, sentences from the third edition are added in purple, fourth in orange, fifth in blue, all through the sixth and final edition in green.

Detail of the large format poster (2009)

When used in an installation, a large format poster accompanies the on-screen interactive version. Making use of the high resolution of print, it’s possible to read the entire text of the book, though remarkably tedious for 190,000 words in a 6-point font. But the print serves as a reminder of what’s behind the coarser resolution of the computer screen — the high-fidelity typography of the entire tome.

Six color offset print (2015)

After receiving several requests, we later created a version of the piece as a 24" x 36" offset print for personal use. With six-color printing and 3-point Bell CentennialMatthew Carter’s incredible typeface designed for phone books in the mid–1970s — we’re able to condense the entire book into something that’s the size of a typical poster, but still readable.

For those uninterested in using a magnifying glass, a full-color book, set to mimic the typography of the 19th-century original, uses the same color-coding scheme.

(Balancing outside interest in these projects and the fact that they’re not part of our day-to-day work, we decided it was an opportunity to give away the proceeds and support causes we felt strongly about.)

We’ve also built versions of the software more suitable for a classroom, by creating an interface that supports primary tasks of reading, searching, and annotating—rather than just introducing viewers to the story of the changes. Of course it’s also possible to apply the methods seen here to other texts, and some of the most interesting examples come from outside the humanities, in fields where it’s necessary to understand the evolution of unmanageably large documents.

In the revisions of Darwin’s Origin of Species we see the humanity of struggling with a scientific idea — one so well-known that 150 years later, with fields like biology and genetics reliant on its truths, it’s often hard to imagine that the ideas were ever in question. Especially outside the scientific discipline, it’s easier to assume that the theory of evolution has always been known — as if it arrived, fully-formed, on stone tablets. Perhaps it’s Darwin’s familiar beard that makes us mistake him for Moses, but the truth is so much more subtle and fascinating — and speaks to the whole of science itself.

Ben Fry

Founder @FathomInfo, co-founder @ProcessingOrg, lecturer @MIT, avoids having picture taken.