I spoke for the first time with Mark Graham, Director of the Internet Archive’s Wayback Machine on May 20, for this post about the problem of internet link rot: Mark contacted me through Twitter to provide some context for my article, explaining that the tool I mentioned that keeps Wikipedia links updated was developed by his team, not Wikipedia, the not-for-profit nature of the Internet Archive and that the 20 million URLs archived each month from 290 Wikipedia sites are just part of the 1.5 billion stored each month.
What are we going to do about those rotten links?
A good article in The Conversation, “Your internet data is rotting”, encourages us to think about an especially…
The scale of the operation and Mark’s willingness to discuss the project led me to contact him again when I was finishing my book, “Viviendo en el futuro” (“Living in the future”, hopefully available soon in English), which has some 498 links in the footnotes — which will not surprise regular readers, who know I like to include references to my sources. This obsession with links has nothing to do with SEO, and is instead is a way to keep my own file of references organized for when I need to retrieve information for my classes, articles or conferences: I guess I’m the biggest user of the search tools of my own page. The reason for contacting Mark was about making sure that somebody reading the book in a few years would not come up against a 404 Not found when checking the footnotes.
Mark answered me immediately and gave me a very simple way to solve the problem: “The best way is a Google Sheet, with the links in column A”. I did this, and in a matter of hours, three additional columns appeared on the same spreadsheet, one with the status of the request, another with the corresponding error if one had occurred while trying to retrieve the page, and another with the permanent link already archived. The procedure only failed on around 45 pages, and for those, Mark simply offered me the use of one of his recent development tools, a public form to register any page on the Internet Archive. I did it (it took me less than a couple of hours), and I was able to provide Planeta, my publishers, with a collection of permanent URLs to put in place of the originals that I had originally referenced in my footnotes.
Mark’s inestimable help earned him a thank you in the corresponding section of my book, and if I ever get to know him in person, I’ll sure try to invite him to dinner :-) Meanwhile, Planeta are now the first, unless someone tells me otherwise, to publish a book whose links are permanently archived in the Internet Archive, safe from the link rot hell. This is a procedure that will undoubtedly become the norm for any reference work hoping to stand the test of time, and one I would strongly recommend for academic papers.
If the Internet Archive and the Wayback Machine didn’t exist, someone would invent them: they are essential tools for anyone who intends to create content with a reasonable lifespan and they also improve Wikipedia, which now provides two-page previews of the books cited in articles, and that anybody who creates content on the internet can use. The page archiving tool allows you to archive not only what you just wrote, but also to ask for all the pages to which you provided links to be stored, turning something as modest as this simple personal site into something permanent in time and that will allow readers to recover links, in the process creating a higher quality website with fewer broken links, in which more things are saved from the ravages of time.
(En español, aquí)