Avoiding the Rear View Mirror

Keynote talk to Digital Editing Now conference at CRASSH, University of Cambridge, 7–9 January 2016: http://www.crassh.cam.ac.uk/events/26264

Slides for this talk are available at: http://www.slideshare.net/burgess1822/avoiding-the-rear-view-mirror

[SLIDE: McLuhan quotes]

I have taken my title today from one of the most famous aphorisms of Marshall McLuhan. McLuhan pointed out how, in using new technologies, our vision is restricted by our previous experience. We do not immediately grasp the potential of new technologies but interpret them in the light of what we know. McLuhan called this effect ‘the rear view mirror’. ‘When faced with a totally new situation’, McLuhan declared, ‘we tend always to attach ourselves to the objects, to the flavor of the most recent past. We look at the present through a rear-view mirror. We march backwards into the future’. Moreover, McLuchan suggested, we cling to the rear view mirror because the view it offers may be more comforting than confronting what is visible through the windscreen. In McLuhan’s words ‘Ordinary human instinct causes people to recoil from these new environments and to rely on the rear-view mirror as a kind of repeat or ricorso of the preceding environment, thus insuring total disorientation at all times. It is not that there is anything wrong with the old environment, but it simply will not serve as navigational guide to the new one’.

McLuhan’s overall analysis is unconvincing — he uses this perception to criticise the Communist Manifesto as a backward-looking document — but, as so often with McLuhan, the epigram, with its image of society approaching technology with an eye to the rear view mirror, is compelling. The rear view mirror effect can be seen at many points in the history of technology. When Lewis Cubbitt was asked to design one of the earliest railway stations, the best template he could come up with were the imperial stables in St Petersburg. In naming parts of airplanes, we look backwards to the terminology used in ships. The history of text technologies provides many examples of the rear-view mirror effect. The most well-known is the way in which the first printed books were imitations of manuscripts. Another example of the rear view mirror is the way in which some early photographers used photography to create scenes which were like historical paintings. And of course the rear-view effect pervades our approach to digital technologies. One need only think of the way we retain a qwerty keyboard designed for typewriters, complete with a carriage return key, or our use of the metaphor of the library to describe collections of digital objects. In this context, the use of the term ‘edition’ to describe the procedures we adopt to deal with the representation of a variety of cultural objects in a digital form may be seen as another example of the rear view mirror effect, appealing back to the authority associated with celebrated book editions, to Erasmus and beyond.

The rear view mirror has fundamentally shaped many of the digital resources commonly used by humanities scholars today. Some of the most widely used packages such as Early English Books Online or the Burney Newspapers are produced by firms which began by producing microfilm products in the 1960s and 1970s, and the subscription and distribution models remain rooted in microfilm precedents. Moreover, initial work on the digitisation of the Burney Newspapers and the English Short Title Catalogue began as projects to facilitate easier access to microfilm. The Burney Newspapers project had its roots in experiments undertaken by the British Library to facilitate easier reader access to microfilm images. The production of a searchable resource was not the primary aim. Search was only added in order to facilitate easier movement around digital images, which helps explain why it was felt that such poor quality OCR was acceptable. In the case of Early English Books Online, the initial functionality was not much greater than that provided by microfilm, and the need for a searchable text spawned a separate project in the form of the Text Creation partnership.

Another illustration of the use of digital technologies to preserve old formats is the calendar. The preparation of summaries known as calendars as a means of presenting the voluminous contents of administrative records has a venerable history stretching back to the seventeenth century and beyond. When programmes to improve access to the public records were set in hand in the late nineteenth century, priority was given to the publication of calendars of chancery records. However, the preparation and publication of such summaries was expensive both in manpower and priniting costs. By the time of the publication of Roy Hunnisett’s Editing Records for Publication in 1977, recommendations for editorial procedure in calendars were driven by the need to reduce printing costs — to the extent that Hunnisett proposed that no post-1300 records should ever be printed in full because of the enormous printing cost. The high cost of publication of calendars and the fact that they diverted resources from the cataloguing of inaccessible records meant that the publication of record calendars had virtually died out by 1990. The way in which the web has been used to revive the moribund form of the calendar is one of the most striking examples of the rear view mirror effect in action. The web provides the opportunity to completely reimagine the way in which large series of records such as the Inquisitions Post Mortem, Ancient Petitions and Gascon Rolls are made accessible. Yet the projects making these records available are rooted in the venerable calendar traditions, even to the extent of sticking with Hunnisett’s recommendations, despite the fact that these were largely determined by the need to reduce printing costs.

In this context, it is difficult to escape the suspicion that the metaphor of the digital edition is also driven by the rear view mirror. This certainly seems to be the case in the way in which our online editions follow canonical forms, focussing on major works such as the Canterbury Tales or Beowulf or canonical authors as Ben Jonson or Jane Austen. It is striking that, where funding is available for conventional printed editions of authors such as Shakespeare, Wordsworth or Robert Burns, this is preferred, as if the digital edition is somehow second-best or the preserve of geeky enthusiasts. The innovative components of digital editions, such as the incorporation of images, are done in comparatively crude fashion, perhaps reflecting our uncertainty as to where facsimiles sit in terms of editorial practice, as Kathryn discussed yesterday. Is our attachment to the idea of the edition itself an example of marching backwards into the future?

One of the major issues confronting those seeking to create digital resources since the advent of the world wide web in 1993 has been dealing with what one might call the digital overhead. Securing resources for the development of digital projects has been difficult because it has been necessary first to turn the printed or manuscript text into a digital form. This is expensive and requires considerable technical expertise and resources. The evidence of the benefits of digitisation has often been sketchy and compelling arguments for digitisation have not been articulated. Consequently, discussion of the potential of digital methods has become hopelessly enmeshed in the question of whether these additional costs are justifiable. However, now we have the legacy of more than twenty years of digitisation activity, we can contemplate projects that do not need to encounter this digitisation overhead — they can use existing digital data. Moreover, increasingly the primary materials with which humanities scholars are concerned are born digital. It is perhaps when we contemplate born digital material that we can best escape the rear view mirror effect. How will humanities scholars engage with large born-digital archives? Will they need anything that resembles an edition? Or are we thinking about something completely different? And how does that affect our view of our procedure with more historic materials?

[Slide: Wikileaks]

Consider for example the case of the Wikileaks material. This is clearly a cache of material which will be of major importance for future historians. This illustration shows a visualisation one of the first releases of Wikileaks material, the field reports of the Iraqi war published by the Guardian in October 2010. These war logs comprised over 391,000 records of military incidents in Iraq and provided the first authoritative information about the number of civilian casualties in Iraq. Dealing with this data, and the 92,000 rows of data of field reports from the Afghanistan war, posed new and challenging problems for the Guardian journalists who worked with it. One declared, ‘It is like panning for tiny grains of gold in a mountain of data’. Having reduced the field reports to a vast spreadsheet, the visualisation tool developed to navigate the spreadsheet was based on an interactive map the Guardian had created as a guide to the Glastonbury Festival. The journalists interrogating the spreadsheet felt that they were learning new skills, taking the first steps in what we are starting to call data journalism. The way in which these journalists worked very directly with this first release of Wikileaks data anticipate the methods future historians may also need to adopt with this material.

The Iraqi and Afghanistan logs were small-scale compared with what was to follow on Wikileaks. The US embassy cables leaked in 2010 comprised over a quarter of a million cables, some dating back to 1966. The leaks of Edward Snowden were to dwarf even these releases. In 2015, Julian Assange wrote that ‘Wikileaks has published 2,325,961 diplomatic cables and other US State Department records, comprising some two billion words. This stupendous and seemingly insurmountable body of internal state literature’, Assange continued, ‘which if printed would amount to some 30,000 volumes, represents something new. Like the State Department, it cannot be grasped without breaking it open and considering its parts. But to randomly pick up isolated diplomatic records that intersect with known entities and disputes, as some daily newspapers have done, is to miss ‘the empire’ for its cables’. In his account of the genesis of the ‘unauthorised autobiography’ of Julian Assange, Andrew O’Hagan describes his irritation at the apparently ineffectual efforts of Assange and his associates in editing and publishing material from the archive. But this underestimates the difficulty of dealing with such huge quantities of data. While initially efforts were made to remove names of security officials from the documents, they could still be identified from the text, so documents began to be released without redaction. Many of the controversies around Wikileaks were in fact editorial issues.

O’Hagan complains that Assange could easily have produced an anthology of Wikileaks material, and a selection of important leaks amounting to just over 100 pages was published as an appendix to the unauthorised autobiography. But Assange’s point that it is necessary to try and grasp the scale of the documentation and of US State Department activity is an important one. A partial solution offered by Assange was the publication of a series of regional studies using Wikileaks material to analyse US intervention in different countries which has appeared as The Wikileaks Files. Each of the chapters in The Wikileaks Files incorporated a large number of extended transcripts of Wikileaks material, but this still only gives a very partial sense of the quantities of material available. However, the way in which this publication erodes the boundaries between the presentation of primary source material is perhaps a significant straw in the wind.

There will never be an edition of the Wikileaks documents analogous to the calendar editions of the letters of English medieval state. It will be necessary for historians to work directly with the Wikileaks data in just the way that the Guardian journalists had to, using spreadsheets, text wranglers, visualisations and whatever other tools are appropriate. As a result of this, the way in which historians demonstrate and share their conclusions will also have to change. A historical discussion of the Iraq war might well take the form of a series of visualisation. Would these be an edition or a commentary? They are of course neither — it will be a new form of historical discourse for which we don’t yet have a name.

The impossibility of creating editions from born-digital materials is also apparent from e-mail archives. Editions of letters of rulers and politicians are at the heart of much historical research. When I started work at the British Library, I worked on the papers of the Dukes of Marlborough and had a daily opportunity to admire Henry Snyder’s three-volume edition of the correspondence between the 1st Duke of Marlborough and Lord Godolphin as a guide to the high politics of the period. For the later eighteenth century, it was possible for Aspinall to produce single-handed compendious editions of the correspondence of George III and George IV (although a larger scale online project has recently been announced by the Royal Archives and King’s College London). By the late nineteenth century, the expansion of information had become evident. The papers of William Gladstone in the British Library comprise approximately 160,000 documents, bound in 762 large volumes. These are still however a comparatively manageable amount of material. A single scholar might contemplate producing a calendar of this material as a large-scale project.

[Slide: Bush e-mail]

Contrast Gladstone’s papers with the e-mail archive of President George W. Bush. E-mail messages sent and received by each member of staff in the White House during Bush’s presidency are stored in the Electronic Records Archive of the US National Archives and form part of the Presidential Library. The system contains over 200 million e-mail messages. The electronic records for Bush’s presidency amount altogether to over 80 terabytes.

[Slide: Kress e-mails]

There are still many restrictions on access to this archive, but an example of an e-mail exchange is available on the presidential library web site. This consists of a small selection of e-mails by Sandy Kress, the Presidential Advisor on Education, concerning the drafting of the No Child Left Behind Act of 2001 — hardly the most contentious aspect of the Bush presidency. This tiny selection of e-mails illustrates the problems of modelling the presentation of such born-digital media on the precedents of the historic printed edition. We can trace from this small selection of emails detailed aspects of the drafting of the legislation, but have no sense of how it fits in with the wider pattern of Kress’s work or how Kress’s network fits into the wider White House. In order to assess this, we need not only the whole archive — all 200 million messages if possible — but we need to interrogate it in an electronic form. The type of method we might want to use would seek to exploit the rich metadata available in e-mails to see who is corresponding with whom, who is being copies into e-mails, when they are sent, how frequently, the distribution of subjects and so on.

The historian wanting to use the Bush e-mail archive to investigate his presidency will need (ironically) to use precisely the type of methods of analyzing e-mail metadata which GCHQ and the NSA used to identify possible terrorist activity. The result of such analysis of email activity would probably be a network diagram — thertre are a number of vree tools available to generate visualisations of your own email activity, such as Immersion, if you are interested. Historians will want to incorporate diagrams of this sort in their work. Indeed, such diagrams might provide the primary means of access to this material — which raises the question again of whether the analogy of an edition is the best means of describing how we will present such primary materials. While the Bush e-mail archive seems at the moment to present a large and intractable problem, it is on a much smaller scale than some other developments currently in train. The UK National Archives for example is considering how it archives the entire e-mail output of the British civil service.

The examples I have considered thus far are of administrative records, chiefly used by historians, and it could be argued as Hans Walter Gabler was stressing yesterday that documentary editing of this sort has always been different to the editing of literary texts (as the tradition of the calendar illustrates). However, e-mail archives are increasingly becoming of importance for literary studies. The editing of an author’s correspondence has a long and distinguished history, but is now starting to be transformed in just the way we have seen with Wikileaks and the Bush e-mails. The British Library, Bodleian Library and other institutions have begun collecting e-mail archives of authors. In 2011, the British Library’s remarkable pioneering curator of digital manuscripts, Jeremy John, archived 40,000 e-mails of the poet Wendy Cope. Cope’s archive is on a much smaller scale than that of President Bush, but raises just as many issues. Historically, the British Library has catalogued each letter in its collections individually, but it would clearly be absurd to contemplate this for 40,000 e-mails. Again, the best way of exploring this archive will be to make use of the metadata which has already been provided. Rather than an edition or catalogue as intermediary, we need to exploit the information embedded in the e-mails themselves.

[Slide: Irvine Welsh twitter stream]

But, we are constantly told, e-mail is in decline as a means of communication (although my inbox doesn’t seem to show any sign of this). Maybe we would want to collect more than an author’s e-mails. For certain authors, for example, their twitter stream might be of interest. A good example might be the twitter stream of Irvine Welsh, the author of Trainspotting. Irvine Welsh’s twitter stream is interesting because Welsh writes a large proportion of the tweets himself and describes his everyday life rather than engaging in commercial promotion. A moment’s glance at Welsh’s twitter stream shows that it is potentially a very valuable source for those interested in his life and work. Twitter for example reveals that Welsh’s Christmas reading was Shaun Ryder’s book on UFOs, ‘What Planet Am I On?’ and conveys the surprising image of Irvine Welsh being prevailed upon by his mother to watch Downton Abbey. And we have a characteristic Irvine Welsh view of Christmas in Scotland: ‘Christmas in UK is ideal break from Christianity. Despite media propaganda, never heard the word ‘Jesus’ in any private or public house’.

Again there is the problem of bulk. Welsh has produced to date over 61,000 tweets. Moreover, the information about Welsh in these tweets goes beyond the 140 characters of the tweet itself. In fact, there is more metadata than text in a tweet, since each tweet contains 150 metadata points, describing the time, place, language, account details and so forth of the tweet. In the case of Irvine Welsh, his movements and use of twitter would obviously be of biographical interest, while an analysis of his followers would provide a lot of information about reception of his work. Curation of twitter feeds is difficult, particularly since Twitter has recently restricted third-party access to its data. Nevertheless, it is clear that in order to fully exploit the richness of the metadata provided by the twitter stream of an author like Irvine Welsh, researchers would need to work directly with the twitter data. An anthology of selected tweets of Irvine Welsh or indeed Salman Rushdie, Margaret Atwood or Brett Easton Ellis — all authors who are active on Twitter — would be of very little value, but the twitter data for all these authors could be of great value for researchers.

[Slide: Rushdie Mac]

The archive of Salman Rushdie suggests another future aspect of editorial activity. In the case of commercial or government archives such as civil service e-mails, it is possible to contemplate managing curation and preservation, if these requirements are given sufficient priority by IT service managers. In the case of private papers, however, individuals like Salman Rushdie are unlikely to be particularly systematic in their approach to digital preservation or migration. Yet the word processing files of authors like Salman Rushdie are precisely the sort of data to which editors will in future need to have access. Already curators of special collections are beginning to encounter the problem of the carrier bag containing moribund computer discs. In 2009, John Updike sent 50 five and a quarter inch discs to the Houghton Library. When Salman Rushdie’s archive was acquired by Emory University, it included four Apple computers containing altogether 18 gigabytes of data including word processing files for many of his most important works. However, one of these computers hadn’t worked since Coke had been spilt on it. The recovery of the data from these computers has been a major piece of digital forensics work, but is surely just as justifiable as the type of expensive paper conservation that is frequently required for new acquisitions of literary archives. Moreover, the Salman Rushdie digital archives project has been able to recreate the type of computing environment that Rushdie used in 1992 and to present his files through it.

This sort of digital forensics work on literary texts is already becoming more commonplace, and it seems likely that this type of specialist forensic work will start in the future to occupy more of the space currently occupied by editorial practice. But the examples I have given so far have all been of new born-digital materials. I think we would all fervently hope that the future cultural landscape is not dominated by the born digital, and that our written heritage continues to play an important part. And surely here the historic format of the edition, digitally revitalised, will continue to be important. Certainly much of the experience to date suggests that the roots of digital scholarly editing in historic practice are still evident and will continue to be important. However, I’d like to suggest that this is beginning to change.

Hitherto, we have tended to conceive of our digital activity in a book-like form, partly because of the need for funding and other administrative purposes to package our digital activities into discrete book-like bundles. Digital scholarly editing 1.0 gave us a digital shelf of editions of famous authors: Beowulf, Chaucer, Langland, Ben Jonson, Jane Austen, Johnson’s Dictionary and so on. We have produced packages that are shaped awfully liked books then expressed surprise that they have been used like books. However, the web is not about producing discrete book-like packages. It is about building huge resources through linking and sharing. There are arguments about the extent to which commercial intervention is fracturing this, and I share the dystopian concerns about the future of the web in a Mark Zuckerberg universe, However, the past twenty years have shown us how a shared web, built from the bottom up, can quickly create huge resources like a vast coral reef.

A familiar example of the way in which online services constantly develop and grow by linking is the library catalogue. The catalogue activity is a continuous one, and as library catalogues were linked first to form regional, then national and finally international consortia they have created new entities which have grown beyond a conventional library catalogue. The way in which the ESTC has grown from a specialist subscription database to a resource which underpins many key digital resources in the humanities illustrates the point. Perhaps then we need to think not in terms of circumscribed projects based around individual authors or themes, but rather on the development of large-scale shared resources.

This is after all what has happened with the Text Creation Partnership. I must admit that when I first heard about the project to create TEI transcriptions of large parts of the images in Early English Books Online, I thought it was a strange way of proceeding — retro-fitting searchable text which perhaps should have been there in the first place. But this was very much a blinkered project management view, and I can now see that the Text Creation Partnership activity is part of the way in which digital resources like EEBO will grown and develop — just as library catalogues have done — to form bibliographical reefs.

[Slide: PCA from Wine Dark Sea]

The Text Creation Partnership shows how the issues I have described with born digital resources are starting to become pressing in the study of historical texts. The transcripts produced by the Text Creation Partnership are not editions — they simply present a TEI encoded version of the copy of the book presented in EEBO. Yet the availability of these transcripts must raise questions as to how far anyone contemplating digital editions of authors represented in EEBO makes use of the assets already created by the TCP. Moreover, just as researchers will much rather work with born-digital data, so I feel that increasingly they will want to visualise, mine, mash, slice and dice the data that is now already available in such vast quantities in the TCP. In the summer, I was very privileged to attend the second Early Modern Digital Agendas summer institute funded by the NEH at the Folger Shakespeare Library, directed by Professor Jonathan Hope. Over the course of a heady three week period, the delegates used a variety of methods ranging from social network analysis to text mining to interrogate the data in the Text Creation Partnership. They looked for example at how social network analysis can be used in the study of different types of plays to assess the characteristics of such different genres as comedy, tragedy, history and so on.

Among the presentations at the institute were discussions of methods of annotating and enhancing the Text Creation Partnership data, and that will clearly be a next step. Rather than preparing editions from TCP data, it might be much more fruitful to continue to develop and enhance the TCP rather than continue to create discontinuous and disconnected editions. As the TCP grows and further resources inspired by the TCP appear, such as Laura Mandell’s parallel work with eighteenth-century books, it may be that our focus as a community will increasingly be around shared activity to annotate, enhance and share these large-scale strategic resources. Moreover, it may be that increasingly we see these resources not as textual data but as images. My own particular starting point in thinking about editorial practice has been my long-held ambition to produce an edition of the records of the English Peasants’ Revolt of 1381. Images of many of the records I would like to edit are held in the vast Anglo-American Legal Tradition database at the O’Quinn Law Library of the University of Houston, which contains over eight million images of medieval legal records I the National Archives. My thinking about my 1381 edition increasingly revolves around ways of identifying relevant material in the Anglo-American Legal Tradition resource, and it may be that this is another example of the type of organic development of resources which I would like to see replace the edition.

If we take our eyes off the rear-view mirror, there can be no doubt that what is coming towards us are very large-scale born-digital resources. Archiving the web is only the beginning. Dealing with government e-mail archives poses intellectual and methodological challenges which make archiving the web seem like chicken feed. As we develop methods to undertake research on these materials, we will naturally want to use these methods on the digital archives which have been created at such labour and cost for historic materials. We will become increasingly concerned with dealing with large-scale textual resources such as the TCP, Eighteenth Connect data or newspaper archives. In this context, we may become increasingly concerned with annotating image archives.

I wouldn’t go as far as to predict the death of the edition. It is evident that, for the foreseeable future, we are going to be in a mixed economy of reading, with a vibrant market for printed books for leisure purposes, while digital outputs address more specialist needs. Within that context, there will continue for the foreseeable future to be a need for editions which present authoritative views of canonical texts. But I think that the way in which we explore those texts in a digital format will change radically, and the issues presented by born-digital materials give a sense of what lies around the corner.

--

--

Andrew Prescott
Digital Transformations: Talks and Presentations

Digital humanities enthusiast at University of Glasgow. AHRC Digital Transformations fellow. Writes on manuscripts and history. Tweets in personal capacity.