The future of past news: Archiving online news

By Sharon Ringel

On the 15th and16th of November, archivists, journalists, researchers, information specialists and program analysts gathered for “Dodging the Memory Hole 2017: Saving Online News,” a conference by the Journalism Digital News Archive of the Donald W. Reynolds Journalism Institute at the University of Missouri. It was the fifth annual conference, but there was something powerful in this year’s location — the Internet Archive.

In a converted San Francisco church that serves as the IA (Internet Archive) headquarters, Brewster Kahle gave an almost spiritual keynote. “Providing universal access to knowledge can be one of the greatest things humankind can achieve… be a better library. No more ‘Post-Truth’, ‘alt-facts’,” he preached to community members seated on the old church benches. His audience, however, was small and made up of information experts rather than journalists. How is it, then, that both news organizations and journalists themselves seem oblivious to the fact that many things out there on the web won’t be preserved for the future, and without a proper archiving plan, online news — the first draft of history — will become inaccessible?

Among the various presentations during the two-day conference, speakers introduced the “Dodging the Memory Hole” community to archiving tools for online news. A few examples:

  • Mark Graham, director of the WayBack Machine, explained the “Save Page Now” option on its new browser extension for Google Chrome, Safari and Firefox. The extension also allows to scroll back through older versions of the same page — all in one click.
  • Art Pasquinelli introduced the open-source program LOCKSS, which is based on the principle that “lots of copies keep stuff safe.”
  • Ilya Kreymer and Anna Perricci demonstrated recording online news sites with Webrecorder, which can also capture interactive, video and audio news content.
  • Will Crichton presented a demo of Esper, a new tool he is developing — based on the content of the Television Archive at the IT — which enables retrieval of search results by the content of the video (and not just metadata, the description and the title of the video).

But as Kathleen A. Hansen and Nora Paul, co-authors of Future-Proofing the News: Preserving the First Draft of History, argued, wonderful as these technological leaps may be, the important thing is that people actually use them. Furthermore, news organizations and journalists who produce these materials are not accountable for saving of their own materials. In the past, news librarians used to oversee the preservation aspects of the news organization, but news librarians are an endangered species, said Hansen and Paul. When news organizations got into a financial distress, the first people that they let go were the news librarians. One example for this negligence is that the very first historic edition of the New York Times’s homepage, can only be accessed through WayBack Machine. The first version saved is from November 1996 — but the Times webpage launched in January of that year.

Twenty years later, the Times is working on its web archiving. Justin Heideman, senior software engineer at the Times, introduced the organization’s strategy for the 7.4 million pages of the Times online. In last year’s conference, the Times technology team presented TimesMachine, an online archive that launched in 2014 and consists of digitized newspapers published between 1851 and 1980. The NYT technology team are currently working on preserving their digitally-born news content.

However, the Times is a rare example. Other news organizations — especially small ones — do not have the resources required for archiving and therefore rely on outsourced archiving of their content by third parties such as the IA.

Journalists rely on archives now more than ever. Archives are not just about (or for) the past, they are (crucially) for the present and the future. Journalists need archives for investigative reporting, for fact checking and validation, for the preservation of their own sources and reporting. What we write on the internet is not inscribed in stone, so journalists must be able to preserve and access their sources in order to make solid argument. From an archivist’s perspective, journalists need to learn how to write metadata and tag their content — information that will help provide a better use of it in the future. News organizations can also benefit from archiving by relying on repackaging and reselling old content they own for revenue. The news archive is a journalistic product, a cultural resource, and should be treated and preserved accordingly.

Sharon Ringel, PhD is a visiting scholar at Columbia Journalism School and a Tow fellow. Her research interests focus on the ways digital archiving practices are implicated in the narratives which future memory agents will be able to produce.