Using the Internet Archive to cite websites

Achintya Rao
3 min readDec 13, 2016

--

[I started composing this piece while I was at #OpenCon, but didn’t manage to finish it then. I was reminded of my languishing draft when I came across a tweet pointing to Timothée Poisot’s post on whether we should cite blog posts or not. I won’t go into the many reasons I respectfully disagree with Timothée (for that is a very different post from the one I’ve written below), but I will point out that he unfortunately ignores the many kinds of sources social scientists have to cite that aren’t papers or books: opinion pieces or official policies, for example, some of which might only be available on a website or, wait for it, a blog post. So, if we do have to cite a blog post, how should we go about with it?]

When writing a research paper, I often cite websites. Anyone who has had to deal with the myriad citation styles out there knows how frustrating it can be to cite a paper correctly, never mind a website. Especially since – unlike citing a PDF or a physical book – the content at a particular URL might change between the time we cite it and the time someone takes a look to verify our statement.

Take this page on CERN’s website, which states:

Some 12,000 visiting scientists from over 70 countries and with 120 different nationalities — half of the world’s particle physicists — come to CERN for their research.

Clearly, the numbers quoted here may change over time. How, then, do we cite such a resource?

The current practice is to explicitly mention that the information we’re presenting was available on the website on a particular date by, say, appending “Accessed on 13 December 2016” to the end of your citation. But this isn’t a particularly good solution, since it doesn’t help our readers verify what we covered in our piece.

I propose that we exploit the power of the Internet Archive’s Wayback Machine and have our citations point to URLs with persistent content. Here’s that same page on CERN’s website, but captured by the Wayback Machine as it appeared on 13 December 2016. Therefore, instead of merely saying “Accessed on…”, let us add “Archived at” information to our citations. MLA has even suggested how we might do so, on this FAQ on the Internet Archive’s website:

How do I cite Wayback Machine urls [sic] in MLA format?

This question is a newer one. We asked MLA to help us with how to cite an archived URL in correct format. They did say that there is no established format for resources like the Wayback Machine, but it’s best to err on the side of more information. You should cite the webpage as you would normally, and then give the Wayback Machine information. They provided the following example: McDonald, R. C. “Basic Canary Care.” _Robirda Online_. 12 Sept. 2004. 18 Dec. 2006 [http://www.robirda.com/cancare.html]. _Internet Archive_. [ http://web.archive.org/web/20041009202820/http://www.robirda.com/cancare.html]. They added that if the date that the information was updated is missing, one can use the closest date in the Wayback Machine. Then comes the date when the page is retrieved and the original URL. Neither URL should be underlined in the bibliography itself. Thanks MLA!

To generate a Wayback Machine URL, first create a bookmarklet in your browser with the following code [source]:

javascript:void(window.open('https://web.archive.org/save/'+location.href));

Now, when you’re on the page you want to archive, simply click the bookmarklet, and you’ll have a freshly minted URL pointing to content from the page as it was when you submitted it to the Wayback Machine. Add it to your citation to ensure that your readers always see exactly what you saw when you decided to cite the page.

Go ahead and edit-/delete-proof webpages you want to cite!

Oh, and don’t forget to donate to the Internet Archive to support all the amazing work they do.

--

--

Achintya Rao

#particlephysics #scicomm for @CMSexperiment at @CERN • PhD student at @SciCommsUWE • research summaries at @apostilb • PGP: https://keybase.io/RaoOfPhysics