Monks work in a scriptorium, the medieval, offline version of the Internet Archive. Painting by John White Alexander, 1896. Public domain. http://commons.wikimedia.org/wiki/File:Manuscript-Alexander-Highsmith.jpeg

Reminder: the Internet Archive is really important


If you’re like me and work in the higher education or scientific communities, you probably woke up to your Twitter feed abuzz with commentary on an advice column published on Science Careers’ online magazine. Essentially, a scientist wrote in to ask for advice on what to do about her PI constantly looking at her chest. The advice was to grin and bear it. Twitter and the blogosphere rightly erupted about this.

An intrepid and forward thinking librarian (me), who was admittedly perhaps paying more attention to Twitter than his cataloging at the moment, managed to use the Internet Archive’s Wayback Machine to archive the page for posterity. Why? I saw what was coming: Science would issue a muted apology and take it down. The primary source (the advice post) would be lost to the web forever. I was not concerned out of vindictiveness or righteousness. As a librarian, I was simply concerned with preserving a primary source on the status quo of power dynamics in the scientific and research communities.

Because there is a cached (archived) copy on the Wayback Machine now, New York Magazine was able to run a story about this whole thing and use the primary source (the advice column) in the story, instead of relying on the secondary sources (tweets and blogs). This is the new means of scholarship and journalism using the old tenets of scholarship (primary and secondary sources), but we have to be proactive about it.

According to New York Magazine, the cached version is a reminder that the Internet never forgets. This is not true. Had I not submitted the link to the Wayback Machine when I did, it’s doubtful the original post would have been captured. The article went live sometime on Monday, June 1. And by sometime later that day Science had already taken it down. Science smartly published the apology and ‘we took it down’ notice at the same URL. They wrote over it. White washed it, perhaps.

Science could still remove it, using the (rightly provided) policies the Internet Archive has about intellectual property claims. I would not be surprised if they attempted to do so, but I hope for the best nonetheless. I believe preserving something like this on the Internet Archive is fair use, but that’s not for me to decide.

There were two more capture attempts by Wayback Machine (I don’t know if they were machine or human driven), but they captured the editor’s response and not the original article, because the original URL now is just the apology.

So when you come across something on the web that you think should be preserved, do so. It’s easy. Go to https://archive.org/web/ and look for the save page now option. Enter the URL into the box and hit the button. You did it! You preserved the web! You’ll be supporting future scholarship, and you’ll be preventing people from literally rewriting history.

You can also go here and drag the bookmarklet to your bookmarks toolbar and capture a page even more easily!

Please note that you can’t preserve everything on the web. They explain why in this part of the FAQs, for example.

But you should also consider donating to the Internet Archive. They estimate the Wayback Machine contains 9 petabytes of data, as of December 2014, and grows at approximately 20 terabytes per week. It is literally the biggest library in the world. That’s an absurd, but probably undercounted, amount of data and that much data costs a lot of money to maintain. So consider a donation. It’s for the future, a future that you might not envision but that someone else will be thankful for later.