7. Cultural Alzheimer: 404s and the Unstoppable Linkrot
The very short lifespan of online content and resources


“At first glance, the wealth of information we create today should be a boon to archaeologists and historians of the future: they should have no problem to understand who we were and what we thought. After all, we document our lives in countless ways — from movies to blogs to podcasts like this one.
But nothing is as simple as it seems. As we shall soon see, the wealth of digital information we produce — thousands upon thousands of exabytes — will create new and unique challenges to these future archaeologists.”
(The Domesday Book — Curious Mind)
Notwithstanding the vast quantity of new information being published online daily, there is a huge, growing amount of valuable content that is also literally disappearing, day after day, from the Internet.
Due to this phenomenon, known as linkrot, lots of online valuable information becomes inaccessible. It is forever lost.
Linkrot “also known as link death or link breaking, describes the process by which hyperlinks (either on individual websites or the Internet in general) point to web pages, servers or other resources that have become permanently unavailable”.(Wikipedia)
It should be a responsibility of the whole human civilization to preserve our digitized information in a safe and reliable matter or we risk losing much of our history, knowledge and data.
And that’s where curation plays a very important role.
Digital curation own mandate includes the “selection, preservation, maintenance, collection and archiving of digital asset” in ways and with technologies that can endure the test of time.
But until preserving reliably digital content becomes the norm, it is worthwhile to realize how widespread this phenomenon is and how important it is to put a stop to it.


Linkrot is caused by many different reasons, which include:
- content being moved and relocated without appropriate redirection mechanisms in place
- content being deleted “after the fact” due to publishing or
editorial decisions - legal or copyright connected issues
- change of domain name and URLs
- content being blocked by censoring or restrictive local government filters
- content being blocked or made inaccessible by corporate firewalls
- unpaid hosting fees
- sites being abandoned for lack of economic resources
- accidental expiration of domain name
- owner of site whether human or corporate dies, gets bought, files bankrupcy
- human errors in typing links
Some data.
One study conducted by the journal Science reports that 13% of Internet references in scholarly articles were inaccessible after only 27 months.
See: Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, et al. Information science. Going, going, gone: lost Internet references. Science 2003 Oct 31;302(5646):787–788. DOI:10.1126/science.1088234
The Chesapeake Digital Preservation Group has found that of the original dataset of websites it began working with in 2008, “the content at dot-gov domains showed the highest increase in link rot.
More than 50 percent of the material posted to government domains disappeared from the original documented Web addresses,” according to the 2013 study.
The New York Times reported half the links referenced in Supreme Court opinions were victims of link rot. But the rest of the federal government and state governments are losing data, too.
See: 44 Percent of URLs from Original Data Set (2008) No Longer Work
(“Link Rot” and Legal Resources on the Web: A 2013 Analysis, supra note 8.)
“Unfortunately and disturbingly, the Supreme Court appears to have a vast problem with link rot, the condition of internet links no longer working. We found that number of websites that are no longer working cited to by Supreme Court opinions is alarmingly high, almost one-third (29%). Our research in Supreme Court cases also found that the rate of disappearance is not affected by the type of online document (pdf, html, etc) or the sources of links (government or non-government) in terms of what links are now dead. We cannot predict what links will rot, even within Supreme Court cases.”
Source: http://yjolt.org/something-rotten-state-legal-citation-life-span-united-states-supreme-court-citation-containing-inte
The good news on this front is that we have at least two practical solutions to this major issue already available in our hands. We just need to perfect it and make it available to every human being on the planet.
The first one is to start to seriously curate content, beyond simple republishing to actual preservation and archival.
The second one is to develop a federated wiki of web sites. “In a federated wiki, when you find a page you like, you curate it to your own server (which may even be running on your laptop). That forms part of a named-content system, and if later that page disappears at the source, the system can find dozens of curated copies across the web.”
There are many pros and cons to this approach but it is certainly worthwhile looking further into it.
The third one is biological. DNA will likely be our technological savior. DNA is in fact the perfect medium for preserving and archiving information for tens of thousands of years.
Just one of gram of DNA can hold up to two gigabytes of information and in 2011 a group of scientists has reliably demonstrated how to store, archive and retrieve dozen text, audio, images and video files from a DNA molecule.
See:
Consequences
Wikipedia reports: “To combat link rot, web archivists are actively engaged in collecting the Web or particular portions of the Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public.
The largest web archiving organization is the Internet Archive, whose goal is to maintain an archive of the entire Web, taking periodic snapshots of pages that can then be accessed for free.”
Source: https://en.wikipedia.org/wiki/Link_rot
- Much of the content that we publish online is likely to disappear within a relatively short amount of time. This has disastrous consequences for our ability to inform ourselves, for journalism, history and cultural heritage.
- The problem is even bigger in as much as we are a) overloaded by greater and greater quantities of info, b) we do not realize how much precious information we are losing.
Opportunities
The overall emerging opportunity is the one of not just identifying and preserving valuable content before it gets lost but to actually organize and make sense of such resources in order to increase their value and benefit to the general public.
1) Opportunity for individuals and organizations to “preserve”, “archive” and organize key valuable content and resources before they are moved, deleted, abandoned or lost. (E.g.: Oldversion.com)
2) Opportunity for new tools and services that focus not just on collecting and organizing valuable existing content but also in preserving it in a reliable, everlasting fashion.
Resources
Tools and web services designed to help avoid linkrot and to store/archive digital content indefinitely.
- Amberlink.org
Free plugin for WordPress developed by Berkman Center creates a backup copy of any outgoing link from your website, so that if the site/page goes down or becomes inaccessible your readers can still see its contents. - Perma.cc
Free web app helps scholars, journals, courts, and others create permanent archival copies of the web content they cite. - Permamarks.net
Anonymous cloud archival space for any digital content allows for protected storage as well for conversion and download into multiple formats including HTML, .zip, and, with a PRO paid version, also into PDF, ePub, Mobi, or as an image that can be viewed on many different types of devices (TV, eReader, etc.). - Pinboard
For a very modest yearly amount Pinboard offers an archiving service which saves a copy of everything you bookmark, gives you full-text search, and automatically checks your account for dead links. - Permanent Web Archiving Tools
Extra readings:
Learning from failure: The case of the disappearing Web site
Barone | First Monday
The Internet’s Dark Ages
The Atlantic
Can the Internet be Archived?
The New Yorker
The disappearing web: Information decay is eating away our history
Gigaom
Digital Information Preservation and the Domesday Project
Curious Mind (original transcript)
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations
Harvard Law Review
The web is broken and we should fix it
Hapgood — Mike Caulfield


Thank you for reading.
I am Robin Good, an independent author / publisher with a terminal addiction: help others effectively communicate, learn and market their ideas by exploring new ethical venues, innovative strategies and uncharted territories outside the mainstream.
Discover more about curation right here:
https://medium.com/content-curation-official-guide


P.S.: If you like what you have just read, please help others discover it too by clicking on the heart icon below.
Your comments, critiques, suggestions and ideas are all very welcome. Please do share them.
Click the Follow button below to remain in touch and to be alerted when I publish new content.
Thank you.
Robin