This is the text of a speech I gave as the opening address to the Library of Congress’s Digital Preservation 2014 conference July 22 in Washington, DC. The audience was composed of professional archivists, technologists, and others who work in museums, libraries, universities, and institutions charged with what we generally term “cultural memory.” Where does software fit into that? What does it mean to think of software as a made thing, a wrought human artifact to be preserved, and not just as an intangible, ephemeral trellis of code? Should there be a software canon? Those are the questions I wanted to pose. I’ve retained the verbal character of the original text which was also accompanied by some 50 images, only a few of which are reproduced here.
This “Earthrise” was photographed by the Lunar Orbiter on August 24, 1966. It predates the familiar and iconic Earthrise image by two years. The restored first photo was released to the public on April 22, 2014, as part of an effort aimed at recovering imagery from the robotic probe, the first spacecraft to actually orbit the moon. To date, some 2000 of the Orbiter’s shots have been rescued from nearly 50 years of dormant tape storage by a team working out of a former McDonald’s near NASA Ames in Silicon Valley.
Two days after the publication of the Lunar Orbiter image, on April 24, 2014 came another press release, also concerning the recovery of images from obsolescent media, but of a rather different sort. A group based at Carnegie Mellon had identified and retrieved computer-generated graphics created by Andy Warhol on an Amiga 1000 personal computer in 1985. Acting on a hunch from new media artist and Warhol authority Cory Arcangel, the group forensically imaged floppy diskettes at the Andy Warhol Museum. After some elaborate intermediary steps, including reverse engineering the proprietary format in which the files were originally created and stored, the previously unseen images were released to the public. As befits a find of this magnitude—a dozen new Warhols!—press coverage was extensive, in all the major media outlets.
Two days after that came yet another bonanza for digital preservation. April 26, 2014. This was the day of the already legendary Atari excavation in the New Mexico desert. For those of you who don’t know the story, for decades rumor had had it that the Atari Corporation, facing financial ruin in the wake of the disastrous release of the notoriously awful E.T. game, had dumped thousands, or tens of thousands, or maybe hundreds of thousands of game cartridges in a landfill outside Alamogordo as a means of disposing of unsalable product. Earlier this year a group of documentary filmmakers obtained the necessary permissions and permits, hired the heavy equipment, and started to dig. Within hours they had found what they were looking for. The photographs that quickly blanketed the Web are striking, what media theorist Stephen Jackson would perhaps call “broken world” imagery, resonant for the familiar shapes and artwork distorted, eroded, corroded, by soil and environmental agents. No lost prototypes or other rarities were found—in fact, the games themselves are all widely available and have been playable for years thanks to the retro community. In contrast to both the Lunar Orbiter project and the Warhol images, this was not a discovery about content. It was about aura and allure, sifting the literal grit of what we now widely acknowledge as “digital materiality.”
I begin with this miraculous seven day stretch in part simply to celebrate these pathbreaking achievements and the widespread public visibility that accrued from them. Digital preservation makes headlines now, seemingly routinely. And the work performed by the community gathered here is the bedrock underlying such high profile endeavors. But I also want to set the stage for one more news cycle that followed several weeks later, one which garnered similar levels of publicity but speaks to a rather different dynamic: not the discovery and release of dramatic new content, and not the aura of actual artifacts excavated from the desert sands, but rather . . . something else, something quirkier, and arguably more intimate.
On May 13, in conversation with Conan O’Brien, George R.R. Martin, author of course of the Game of Thrones novels, revealed that he did all of his writing on a DOS-based machine disconnected from the Internet and lovingly maintained solely to run . . . WordStar. Martin dubbed this his “secret weapon” and suggested the lack of distraction (and isolation from the threat of computer viruses, which he apparently regards as more rapacious than any dragon’s fire) accounts for his long-running productivity.
And thus, as they say, “It is known.” The Conan O’Brien clip went viral, on Gawker, Boing Boing, Twitter, and Facebook. Many commenters immediately if indulgently branded him a “Luddite,” while others opined it was no wonder it was taking him so long to finish the whole Song of Fire and Ice saga (or less charitably, no wonder that it all seemed so interminable). But WordStar is no toy or half-baked bit of code: on the contrary, it was a triumph of both software engineering and what we would nowadays call user-centered design. The brainchild of programmer Rob Barnaby and MicroPro’s Seymour Rubinstein, WordStar dominated the home computer market for the first half of the 1980s, before losing out to WordPerfect, itself to be eclipsed by Microsoft Word. Originally a CP/M application that was later ported to DOS, WordStar was the software of choice for owners of the early “luggables” like the Kaypro computer and the Osborne 1. Writers who cut their teeth on it include names as diverse as Michael Chabon, Ralph Ellison, William F. Buckley, and Anne Rice (who also equipped her vampire Lestat with the software when it came time for him to write his own eldritch memoirs). WordStar was justifiably advertised as early as 1978 as a What You See Is What You Get word processor, a marketing claim that would be echoed by Microsoft when Word was launched in 1983. WordStar’s real virtues, though, are not captured by its feature list alone. As Ralph Ellison scholar Adam Bradley observes in his work on Ellison’s use of the program, “WordStar’s interface is modelled on the longhand method of composition rather than on the typewriter.” A power user like Ellison or George R.R. Martin who has internalized the keyboard commands would navigate and edit a document as seamlessly as picking up a pencil to mark any part of the page.
WordStar runs no less efficiently and behaves no differently in 2014 than it did in 1983. But if you’re running it today you must be a Luddite, or at the very least a curmudgeonly author of high fantasy whose success allows you to indulge your eccentricities! This is what was so fascinating (to me) about the public reaction to this seemingly recondite detail about Martin’s writing process: a specific piece of antiquarian software, WordStar 4.0 to be exact, is taken as a clue or a cue to the personality and persona of its user. The software, in other words, becomes an indexical measure of the famous author, the old-school command-line intricacy of its interface somehow in keeping with Martin’s quirky public image, part paternalistic grandfather and part Dr. Who character. We know, that is most of us of a certain age remember, just enough about WordStar to make Martin’s mention of it compelling and captivating. But what is WordStar? It is not content per se, nor is it any actual thing. (Or is it?) WordStar is software, and software, as Yale computer scientist David Gelernter has stated, is “stuff unlike any other.”
What is software then, really? Just as early filmmakers couldn’t have predicted the level of ongoing interest in their work over a hundred years later, who can say what future generations will find important to know and preserve about the early history of software? Lev Manovich, who is generally credited with having inaugurated the academic field of software studies, recently published a book about the early history of multimedia. That project, like my own on the literary history of word processing, presents considerable difficulties at the level of locating primary sources. Manovich observes: “While I was doing this research, I was shocked to realize how little visual documentation of the key systems and software (Sketchpad, Xerox Parc’s Alto, first paint programs from late 1960s and 1970s) exists. We have original articles published about these systems with small black-and-white illustrations, and just a few low resolution film clips. And nothing else. None of the historically important systems exist in emulation, so you can’t get a feeling of what it was like to use them.”
The emerging challenges of software preservation were explored just over a year ago at a two-day meeting at the Library of Congress called “Preserving.exe.” As Henry Lowood, himself a participant in that meeting (as was I) has subsequently noted, this was hardly the first attempt to convene an organized gathering around the subject. Notable earlier efforts included the 1990 “Arden House” summit, which boasted representation from Apple, Microsoft, and HP alongside the Smithsonian and other cultural heritage stakeholders for purposes of launching a “National Software Archives.” Nonetheless, more than two decades later, a roomful of computer historians, technical experts, archivists, academics, and industry representatives once again met to discuss what role the nation’s premier cultural heritage institutions, from the Library and the Smithsonian to the Internet Archive, ought to play in gathering and maintaining collections of games and other software for posterity. You can read various accounts of and responses to that meeting in the Preserving.Exe report that was published on NDIIPP’s Web site. But more than a year out from the event itself, the report’s impact and uptake seems modest, and as recently as 2014 the preservation of executable content was not included in the NDSA’s annual agenda for action. Thus, even as libraries, archives, and museums now routinely confront the challenges of massive quantities of content in digital format, actual software—not George R.R. Martin’s document files, not the character data, but WordStar itself—remains a narrow, niche, or lesser priority.
Matthew Fuller, editor of Software Studies: A Lexicon puts it this way in his introduction to a 2008 MIT Press volume on the subject: “While applied computer science and related disciplines … have now accreted half a century of work on this domain, software is often a blind spot in the wider, broadly cultural theorization and study of computational and networked digital media…. Software is seen as a tool, something you do something with. It is neutral, grey, or optimistically blue.” Software studies, as a sub-field of digital media studies, thus offers a framework for historicizing software and dislodging it from its purely instrumental sphere. Besides Manovich and Fuller, key names in software studies include Wendy Chun, Noah Wardrip-Fruin, Lori Emerson, Shannon Mattern, and yes, German media theorist Friedrich Kittler who memorably proclaimed “there is no software.” I would now like to take a few moments to offer my own elaboration of a software studies framework by sketching some different references points, vectors if you will, not so much for defining software, but for demonstrating the range of ways one might seek to circumscribe it as an object of preservation.
Software as asset. The legal perspective. In 1969, the US Justice Department opened an anti-trust suit against IBM, the result of which was that IBM “unbundled” the practice of providing programs—software—to its clients for free as part of its hardware operations. Instead, IBM introduced the distinction between System Control Programs and Program Products; the latter became a salable commodity. IBM’s unbundling decision is routinely cited as a catalyst for the emergence of software as a distinct area of activity within computer science and engineering at large. The point I would make here is that the object we call “software” is a legal and commercial construct as much as it is a technological one.
Software as package. The engineer’s perspective. Computer historian Thomas Haigh has argued that the key moment for conceptualizing software came when its originators began to think about “packaging” their code so as to share it with others. Haigh makes the analogy to envelopes for letters and shipping containers. In practice, “packaging” the software meant conceiving of the software object not just in terms of code, but also systems requirements, documentation, support, and even the tacit knowledge required to run it. “What turned programs into software,” Haigh concludes, “was the work of packaging needed to transport them effectively from one group to another.” Software becomes software, in other words, when it is portable.
Software as shrinkwrap. The consumer’s perspective. As Lowood has suggested, this is the model that has dominated institutional collecting to date, notably at places like Stanford with its Stephen Cabrinety collection and the Library of Congress’s own efforts at the Culpeper facility. The obvious appeal here is that most shrinkwrapped software is about the same size as a Hollinger box. Haha, no, I’m kidding of course. But the appeal is clearly that it is easy to visualize shrinkwrapped software as an artifact, and thus integrate it into collecting practices already in place for artifacts of other sorts. Nor is this a spurious consideration: on the contrary, the artwork, inserts, documentation, and so-called “feelies” that were part of what came in the box are vital to a history of software.
Software as a kind of notation, or score. Here we are talking about actual source code, and the musical analogy is more than casual. Thomas Scoville gives us a striking account of a programmer who conceives of his coding as akin to conducting a jazz ensemble: “Steve had started by thumping down the cursor in the editor and riffing. It was all jazz to him. The first day he built a rudimentary device-driver; it was kind of like a back-beat. Day two he built some routines for data storage and abstraction; that was the bass line. Today he had rounded out the rhythm section with a compact, packet-based communication protocol handler. Now he was finishing up with the cool, wailing harmonies of the top-level control loops.” Yet scores are also always of course artifacts, themselves materially embodied and instantiated.
Software as object. Here I deliberately use the word “object” in multiple valances, both to connote the binary executable as well as its resonance with object-oriented programming (itself a paradigm about modularity and reuse) and perhaps even the emerging philosophical discourse around so-called object oriented ontologies, or “triple-O” associated with figures like Graham Harman. (Triple-O advocates for the virtues of a non-correlationist worldview, one in which human actors are not the sole agents or foci or loci of experience.) As described on its Web site, the work of the National Software Reference Library at NIST is to “obtain digital signatures (also called hashes) that uniquely identify the files in the software packages.” To date, the NSRL contains some 20,000,000 unique hash values, thus delineating a non-human ecology of 20 million distinct digital objects. Yet the NSRL is not a public-facing collection, and it has no provisions for the circulation of software, nor does it facilitate the execution of legacy code.
Software as craft. The artisan’s perspective. Here I have in mind accounts of software development which deliberately position themselves in opposition to enterprise-level software engineering. Not Microsoft Word, but Scrivener (or for that matter, Medium). Mark Bernstein of Eastgate Systems puts it this way: “Your writing doesn’t come from a factory. Neither does artisanal software. These are programs with attitude, with fresh ideas and exciting new approaches. Small teams work every day to polish and improve them. If you have question or need something unusual, you can talk directly to the people who handcraft the software.”
Software as epigraphy. Yes, I mean written in stone. Trevor Owens has noted the uncannily compelling feature of the tombstones in the classic game Oregon Trail: “What made the tombstones fascinating was their persistence. What you wrote out for your characters on their tombstones persisted into the games of future players of your particular copy of the game. . . . These graves enacted the passage of time in the game. Each play apparently happening after the previous as the deaths, and often absurd messages, piled up along the trail.” Likewise, consider the Easter Egg. One of the most famous is to be found in Warren Robinett’s graphical adaptation of Adventure for the Atari 2600 in 1979. As others have argued, this was an important and innovative game, establishing many of the conventions we use to depict virtual space today (such as exiting a “room” through one side of the screen, and emerging in a logically adjoining room on a new screen). At the time Atari’s programmers were not credited in the documentation for any of the games they worked on, so Robinett created an Easter egg that allowed a player to display his name on the screen by finding and transporting an all but invisible one-pixel object. This seemingly slight gesture in fact speaks volumes about shifting attitudes towards software as a cultural artifact. Does “code” have authors? Is software “written” the way we write a book? Robinett’s game will surely outlast him: is this a tombstone or title page?
Software as clickwrap. This is perhaps the dominant model today, combining the familiar online storefront with advanced DRM and cloud-based content distribution.
Software as hardware. Born of a doubtlessly chimerical conviction and commitment to authenticity, it is nonetheless a preservation model taking hold in some of our more liminal spaces, such as the Media Archaeology Lab at the University of Colorado and the Maryland Institute for Technology in the Humanities, a digital humanities center. Here [above], for example, is WordStar running on a working Kaypro we maintain at MITH. Similarly, Jim Boulton has done phenomenal work preserving the browser software of the early Web.
Software as social media. The Gitub phenomenon. You may think of Github as a place to stash your code. That’s not how they think of it, however. Github believes that the future of creativity, commerce, and culture is executable. At the Preserving.exe meeting, a representative from Github made the connection to such high-minded ideals all explicit, declaring the software culture on the Web a new cultural canon and invoking the likes of Emerson, the Beowulf poet, and Adam Smith.
Software as background. New media artist Jeff Thompson has collected some 11,000 screenshots documenting every computer appearing (usually in the background) of every episode of the TV series Law and Order. We can learn much from incidental popular representations of software. Compare, for example, the extremely realistic depictions of the applications the characters use in the movie adaptations of Stieg Larsson’s Millennium series to the fanciful and extravagant interfaces of many Hollywood blockbusters. What can such shifts and contrasts tell us about popular attitudes toward software?
Software as paper trail. The documentary perspective. In 2012 I spent a week in the archives at Microsoft in Redmond, Washington looking at the early history of Word. I spent almost all of my time looking at paper: specs, requirements, design documents, memos and correspondence, marketing research, advertising and promotional materials, press clippings, swag, memorabilia, and ephemera. Besides corporate entities such as Microsoft, documentary software archives are available at such institutions as the Charles Babbage Institute, the University of Texas, and the Strong Museum of Play, as well as, again, Stanford.
Software as service. Facebook and YouTube, but also future iterations of such formerly shrinkwrapped products as the Microsoft Office suite. What’s notable about the service model is that it’s also supplying some of the most promising models for preservation. The Olive project at Carnegie Mellon is exploring solutions for streaming virtual machines as a preservation strategy. Likewise, Jason Scott (you knew I would get there eventually, didn’t you?) has been doing some simply astounding work with the JSMESS team at Internet Archive, turning emulated software into Web content akin to embedded video and other standard browser features. Is this legal, you ask? Talk to the guy in the funny hat.
Finally, software as big data. Jason Scott again. Having ingested thousands of disk images and ROMs into the Internet Archive’s Historical Software Collection, Jason is now algorithmically analyzing them. A dedicated machine “plays” the games and runs the software 24/7, taking screenshots at intervals and storing these for posterity on the Internet Archive’s servers. Software in other words, preserving software; machines preserving machines. Have you seen this movie before?
So that’s what I’ve got, over a dozen different approaches in all. Doubtless there are others. I haven’t talked about software as apps or software as abandonware or as bits to be curated or as virtual worlds, for example. But underlying all of these different approaches, or “frameworks” as I have called them, is the more fundamental one of what it means to think of software as a human artifact, a made thing, tangible and present for all of its supposed virtual ineffability. Scott Rosenberg, whose book Dreaming in Code furnishes a masterful firsthand account of an ultimately failed software development project, says this: “Bridges are, with skyscrapers and dams and similar monumental structures, the visual representation of our technical mastery over the physical universe. In the past half century software has emerged as an invisible yet pervasive counterpart to such world-shaping human artifacts.”
We tend to conceptualize software and communicate about it using very tangible metaphors. “Let’s fork that build.” “Do you have the patch?” “What’s the code base?” Software may be stuff unlike any other, it may be even intangible, but it is still a thing, indisputably there as a logical, spatial, and imaginative artifact, subject to craft and technique, to error and human foible. Writing software is not an abstract logical exercise; it is art and design, intuition and discipline, tradition and individual talent, and over time the program takes shape as a wrought object, a made thing that represents one single realization of concepts and ideas that could have been expressed and instantiated in any number of other renderings. Software is thus best understood as a dynamic artifact: not some abstract ephemeral essence, not even just as lines of written instructions or code, but as something that builds up layers of tangible history through the years, something that contains stories and sub-plots and dramatis personae. Programmers even have a name for the way in which software tends to accrete as layers of sedimentary history, fossils and relics of past versions and developmental dead-ends: cruft, a word every bit as textured as crust or dust and others which refer to physical rinds and remainders.
Knowledge of the human past turns up in all kinds of unexpected places. Scholars of the analog world have long known this (writing, after all, began as a form of accounting—would the Sumerian scribes who incised cuneiform into wet clay have thought their angular scratchings would have been of interest to a future age)? Software is no less expressive of the environment around it than any object of material culture, no different in this way from the shards collected and celebrated by anthropologist James Deetz in his seminal study of the materiality of everyday life, In Small Things Forgotten. In the end one preserves software not because its value to the future is obvious, but because its value cannot be known. Nor are the myriad challenges and technical (or legalistic) barriers it presents, or the fear of loss, reason to hesitate. As Kari Kraus has noted in a paper published out of the NDIIP and IMLS Preserving Virtual Worlds project, “preservation sometimes courts moderate (and even extreme) loss as a hedge against total extinctive loss. It also indirectly shows us the importance of distributing the burden of preservation among the past, present, and future, with posterity enlisted as a preservation partner. However weak the information signal we send down the conductor of history, it may one day be amplified by a later age.”
Long ago, before even the first dot-com bubble, Sumner Redstone opined, “Content is king.” We all cherish content, like a lost lunar Earthrise image or a newly discovered Warhol. Indeed, I would submit that this community has gotten pretty good at preserving at least certain kinds of digital content. But is content enough? What counts as context for all our digital content? Is it the crushed Atari box retrieved from a landfill? Or is it software, actual executables, the processes and procedures, the interfaces and environments that create and sustain content? Perhaps it is indeed time to revive discussion about something like a National Software Registry or Repository; not necessarily as a centralized facility like Culpeper, dug into the side of a mountain, but as a network of allied efforts and stakeholders. In software, we sometimes call such networks of aggregate reusable resources libraries. Software, folks: it’s a thing. Thank you.
This essay draws from some of my earlier writing on the topic, including this on George R.R. Martin and WordStar and this in Slate, as well as my contribution to the aforementioned Preserving.exe report.