A Defense and Critique of the Internet Archive

Sansu the Cat
Politics & Discourse
18 min readJun 19, 2020
Image in the Public Domain.

It’s a common truism to say that once something is on the Internet, it stays there forever. It is an idea that I worry has made many complacent to the fact that the Internet is a rather mercurial and unstable place. This is evident in the infamous “Error 404" pages that we’ve all come across in our online searches. Every single day on the Internet, websites vanish, videos are pulled, and links are broken. Knowledge is lost and history becomes obscure. Amidst these shifting seas stands the Internet Archive (IA). Founded by Brewster Khale in 1996 with the aim of providing “Universal Access To All Knowledge”, the site, which began as an archive of the Internet itself, has grown to include the goal of listing every book ever made. To this end, it has so far accumulated 330 billion web pages, 20 million books and texts, 4.5 million audio recordings, 4 million videos, 3 million images, and 200,000 software programs. IA is also not without controversy, with its Open Library (OL) lending out copyrighted e-books without purchasing the proper licenses. As a result, authors and publishers feel cheated out of their incomes, and a devastating lawsuit seeks to correct this apparent injustice. On the one hand, I feel that IA could do a lot more in terms of respecting writers, but on the other hand, the prospect getting sued out of existence would be a catastrophe for all.

Digital Lending, Controlled and Uncontrolled

From the very start, many publishers and authors took issue with the fact that OL did not lend out e-books like ordinary libraries. While traditional libraries pay publishers a license fee for a restricted e-book, OL lends out digital scans of physical texts for a maximum of two weeks. As a result, publishers, and by extension, their writers, miss out on royalties. Internet Archive defends this practice as “fair use” under the legal theory of Controlled Digital Lending (CDL), which allows them to lend out one digital file at a time for every physical copy they own. Many writers groups, such as the Authors Guild (AG), criticized the CDL defense last year in an open letter, where, citing a Second Circuit court ruling from Capitol Records v ReDigi, they wrote, “reselling the copyrighted works of others without authorization from the copyright owner was an unauthorized reproduction that directly competed with the “rights holders’ legitimate market, offering consumers a substitute for purchasing from the rights holders.”

The straw that broke the camel’s back was the OL’s National Emergency Library (NEL), which sought to ensure access to digital texts during the COVID-19 pandemic, when many libraries would be closed, by the suspending waitlists. In other words, multiple readers could borrow the a single book. Despite the criticism from AG that the NEL is not a “real library”, the project received a statement of public support from several universities and libraries, including MIT and Penn State. Chris Bourg, the director of MIT Libraries, praised NEL, writing, “In a global pandemic, robust digital lending options are key to a library’s ability to care for staff and the community, by allowing all of us to work remotely and maintain the recommended social distancing.” There have also been many testimonies to NEL’s positive impact, such as that of Benjamin S. Camden, a librarian from New Jersey, who said that access to the library helped him provide basic life support manuals for front line medical workers. You can disagree with NEL’s methods, but it undeniably met a need for many.

Regardless of such effusive praise, four major publishers, John Wiley & Sons, Hachette Book Group, HarperCollins, and Penguin Random House, sued IA for copyright infringement, writing, “IA does not seek to ‘free knowledge’; it seeks to destroy the carefully calibrated ecosystem that makes books possible in the first place — and to undermine the copyright law that stands in its way.” Douglas Preston, the president of AG, wrote in a wider public statement supporting the lawsuit that, “The Internet Archive hopes to fool the public by calling its piracy website a ‘library’; but there’s a more accurate term for taking what you don’t own: ‘stealing’.” While novelist Alexander Chee wrote on Twitter that, “There is no author bailout, booksellers bailout or publisher bailout. The Internet Archive’s ‘emergency’ copyrights grab endangers many already in terrible danger.”

In response to the lawsuit, IA has ended its NEL two weeks early, which to my mind, was probably the right thing to do. IA should have gone the traditional route of requesting licenses, which they could well afford to do, given that their gross revenue in 2018 was roughly $21 million, and that, as of 2017, its top six employees collectively earn over $1 million a year. IA claims that its OL allows authors who don’t want their works on the site to opt out. Novelist Colson Whitehead had success with this option, but other writers have not found things to be so smooth. Authors Victoria Strauss and Virgina Anderson both reported frustration with trying to get their books taken down from IA. Those suing IA make a rather persuasive case that their unauthorized CDL practices could do real harm to writers’ income. It must be said that writers are doing poorly financially. AG found that the median pay of full time writers in 2017 was $20,300, and the median pay of part-time writers being $6,080. The pay of part-time writers has fallen 42 percent since 2009. AG has also found that publishers have lost an estimated $300 million due to online piracy, and IA is not completely innocent here.

The Internet Archive Is A “Real Library”

While the anger of many authors is justified, the ends do not always justify the means. Timothy B. Lee of Ars Technica has written that if IA loses their lawsuit, it could cost them billions in damages and potentially put them out of business. This must not happen under any circumstances. Whatever you may think of its copyright infringements, IA is quite possibly one of the most important websites ever made, and its destruction would represent an incalculable loss to our collective knowledge, culture, and history. Given that IA has ended the NEL, so too must this lawsuit. Bankrupting IA is not a viable solution that will help writers in need.

While I understand the complaints towards IA’s book scanning practices, I must object to its vilification. IA’s critics are trying to draw a clear line in the sand between “real libraries” and the “piracy” of IA, but the truth is a little more grey than this line of rhetoric reveals. IA was officially designated as a library by the state of California in 2007. IA is also a member of the International Internet Preservation Consortium, which an association of libraries that works to archive and make accessible information online. Reducing IA to little more than The Pirate Bay with monocles, ignores the many positive acts it has done in preserving our cultural heritage and making the wealth of the world’s knowledge more accessible to the average reader.

For example.

The Great 78 Project seeks to preserve 78rpm records from 1880 to 1960, with George Blood, the president of George Blood Audio, writing that “If we didn’t do this, 48,000 78s in a little library in Batavia III, Illinois, may have been lost.” IA also has 11,215 Grateful Dead concerts. IA also has 2,975 items from the Brooklyn Museum, 140,000 items from the Metropolitan Museum of Art, and 15,000 items documenting the Occupy Movement. In 2008, IA worked with NASA to bring together all 21 NASA image collections into one place. IA also has 3,686 films in its Moving Image Archive. IA has 3,000 hours of international TV news covering 9/11 and remains one of the most comprehensive televised archives of the event. IA also holds 1,295 free courses, video lectures, and supplemental materials from universities in the United States and China.

In 2003, IA received a DMCA exemption from the Library of Congress to archive old software that often had a shelf-life of 10 to 30 years. Several sponsors over the years have provided IA with digitized books, including the Library of Congress, the Sloan Foundation, the University of Toronto, and Google Books. IA has also digitized 90% of all writings in Balinese, which are mostly written on palm leaves. As of 2013, the IA has 4.4 million e-books, with 15 million being downloaded every month. The archive is also collecting physical copies of every book ever printed.

Finally, the most important feature of IA is the Wayback Machine, which archives billions of web pages which would have been lost to time. The Wayback Machine is also credited for helping to make Wikipedia a more reliable, by adding its archived pages to any citations with missing or broken links. The loss of the Wayback Machine would be a gift to propagandists around the world, who could delete and revise things at will with no one being the wiser.

Nuances such as these are lost in the condemnations of IA made in the lawsuit and elsewhere. While it is fair to criticize certain practices of IA, insisting that is little more than a piracy site is not very persuasive. This becomes less persuasive when you compare OL to the popular e-book pirate sites. On most of these sites, you need only click a few links before downloading the entire book. On OL, you need to first login, possibly be put on a waitlist, and borrow the text for a maximum of two weeks. Given that pirates are all about convenience, which option do you think most of them would rather choose? Now, simply because IA isn’t the go-to site for pirates doesn’t mean it hasn’t done damage to authors or that it’s unauthorized lending of in-copyright books is okay. I only suspect that it may not be the worst, or even the leading offender in e-book piracy.

In fact, some of the offenses in the lawsuit are hardly even that. If one looks at the “Works” listed by the publishers as damaged by IA, a few curious names come up. They are Sylvia Plath (who died in 1963), Zora Neale Thurston (who died in 1960), C.S. Lewis (who died in 1963), and Laura Ingalls Wilder (who died in 1957). This isn’t to say that it’s okay to steal books from authors who have been dead for over half a century, but that their being on IA is hardly some sort of existential threat to publishing. Never mind that three of the texts listed in the suit, Their Eyes Were Watching God (1937) , Little House on the Prairie (1935), and The Lion, The Witch, and the Wardrobe (1950) are already public domain in Canada! Thurston, Wilder, and Lewis’s estates are probably losing more royalties from Canadians than from IA.

The Problem With “Real Libraries”

Now, as far as the “real libraries” are concerned, publishers have attacked them, too. In 2011, AG unsuccessfully sued Hathitrust, a consortium of university libraries, for trying to digitize their collection of “orphan works” — or books where no copyright holder could be found. It’s also worth noting that none of the potential owners of these orphan works joined AG in the lawsuit. (Ironically, on page 21 of the lawsuit, the publishers positively contrast Hathitrust’s book scanning project with IA’s.) Macmillan attempted to limit the amount of e-books for new titles that libraries could lend out, with their CEO, John Sargent, accusing libraries, without evidence, of “cannibalizing sales”. Macmillan later abandoned the effort in March of this year. By the way, AG also supported Macmillan’s plan.

I agree with AG that for works which are copyrighted and in print, OL should purchase official e-book licenses. Given the dire financial state of writers, the responsible thing to do would be to see that publishers share in the profits. Why hasn’t IA worked with authors to see this done? AG claims that for three years, it offered to bring IA’s copying programs in compliance with the copyright law, but that they were refused. Given, however, that we’re only hearing one side of the story here, I’ll take AG’s claims here with a grain of salt. Especially since Kahle himself has claimed in a lecture for the Long Now Foundation, that he wants authors to make money from his library, but many publishers refused him:

“Publishing and libraries have always worked in parallel. The $3 billion to $4 billion that I mentioned before, that libraries spend on publishers products, is about 10 or 15 percent of the book publishing industry. That’s non-trivial and it keeps a whole level of guarantees for publishers to go and get books out there. But you know, we’re having a very difficult time, as a digital library, buying books from publishers.”

Copyright law, however, is not holy writ. I would argue that CDL is justified in cases where the book is rare, out-of-print, or legally unavailable. Call that “piracy” if you want, but I’ve long maintained that even piracy is justified under similar circumstances. AG does not seem to be particularly sympathetic to such arguments. In a statement made in opposition to CDL, they argue that the copyrights to most “orphan works” can easily be found and that CDL proponents are ignorant of publishers who want to bring old texts into the e-book format. AG has put forward a solution to re-licensing orphan works and out-of-print books through extended collective licensing (ECL) by means of mass digitization. ECL allows books to be used without requiring a negotiation with every individual rightsholder, and instead offers licenses for all books. This is presumably the “licensing scheme” that AG claims IA refused to adopt in their discussions.

I may be wrong, but I suspect that IA’s opposition to adopting AG’s “licensing scheme”, came from a fear that it would mean the disappearance of orphan works and out-of-print books from its site. AG believes that passing ECL legislation would solve this issue, and that it is not to difficult to track down the copyright owners of orphan works. As evidence for this, they point out that they were able to look up an “orphan work” from Hathitrust’s list, and trace it back to author J.R. Salamanca. While their example does prove that Hathitrust’s searching methods were flawed at the very least, it doesn’t prove that reuniting all other orphan works would be just as simple. Conversely, Techdirt blogger Mike Masnick has suggested that this whole incident proves that Hathitrust’s scheme could work:

“After all, none of these books had been released digitally yet. The process involved the HathiTrust first trying to track down the authors, then the authors/works being put in a public list, which could be scrutinized by the public to see if any of them could show that the works weren’t orphans. And that’s exactly what happened. Even if you could have hoped that the original investigation was a bit better, it’s hard to argue that the system didn’t work here. It did.”

Where I differ with AG, is that I think the digitization of orphan works and out-of-print texts should absolutely be “fair use” for lending until such a time where copyrights are found or the book is back in print. Archivists need to act quickly to preserve books and can’t wait for publishers to pick up the slack. The burden should be on the publishers to ensure that these works are in print and available to the public. To this end they cannot always be trusted.

Consider that it took sixty years, I repeat, sixty years for Madeleine L’Engle’s Ilsa to go back into print. Books can vanish in less time than that. Are we to go up to sixty years, or more, without access to precious cultural artifacts because publishing companies are too slow to reprint them? I say no. Copyright holders cannot be trusted to curate our history, and considering that they can always opt out of any digitization program, we will always need some degree of copyright infringement to preserve them. Let me bring up two examples to prove my points:

While it isn’t inconceivable that you could save up to buy some of these books, the high cost can still be a barrier to many. The money spent on these used items, in any case, will never return to the original creators. The priority of copyright, will always be profit, not preservation and accessibility. Sometimes, one is lost at the expense of the other. I’d be more open to waiting for these works to fall into the public domain if our copyright terms in America weren’t so ridiculously long.

Competing With Free

AG promotes the idea that if people can get something for “free”, then they can’t be persuaded to pay for it. This was more or less articulated by their executive director, Mary Rasenberger, when she said, “If you can get anything that you want that’s on Internet Archive for free, why are you going to buy an e-book?” This is why their organization opposes CDL, because if the copyright is found or the book is re-printed, it will then have to compete with the free copies around the web. On the surface, this sounds sensible, but I imagine that the truth is more complex. Let’s take a look at other industries, shall we?

It’s very easy to watch anime for free, and yet the anime streaming service, Crunchyroll, has over a million subscribers. It’s very easy to get free music, and yet Spotify and iTunes are more popular than Limewire. It’s very easy to pirate Stranger Things, Game of Thrones, and The Mandalorian, but millions subscribed to Netflix, HBO, and Disney Plus solely to access these shows. Nobody ever thought that we’d get men to start paying for porn again, but Onlyfans has proved us otherwise. Through these examples, I wanted to show that consumers will pay for things they could easily get for free if the services are convenient and affordable. By constantly the pointing the finger at pirates, big publishers are absolving themselves of being more innovative and competitive online.

Indeed, the ones cheating authors out of royalties in these cases are the publishers. Instead of getting mad at Hathitrust or IA, who are merely trying to preserve and make accessible these rare books, AG should level more of this outrage at the publishers who aren’t reprinting these texts. If you’re making free the books that no one otherwise would be reading, this can hardly be compared to stealing books which are sold on the free market. One act arguably takes money away from a product that is making real money. The other takes money away from a product that only makes potential money. And potential money is no money.

AG should also consider the possibility that readers might buy a work which was previously orphaned or out-of-print, because they had read it for free before. This is one of the ways in which libraries support authors. Readers who become fans of writers they read at the library, will be more likely to support their work in the marketplace. The idea that readers won’t buy a book if there’s a free version available is more truism than truth. Readers can be open to persuasion. To paraphrase Techdirt’s Masnick, if you can’t compete with “free”, you can’t compete at all.

Piracy isn’t going away, but its effects can be managed through smart business practices on behalf of copyright holders. Instead of trying to abolish piracy, they should try to compete with it. An industry that finds itself existentially threatened by internet piracy, let alone libraries or used book sales, should probably rethink its business model. Books face greater competition now from television shows, superhero movies, streaming services, YouTube videos, blogs, podcasts, and social media. Not as many people have the attention spans for reading books, as novels and poetry no longer shape our culture as strongly as they once did. Even if e-book piracy were to vanish overnight, I suspect that publishing would still be suffering due to these wider factors.

While publishers and writers have vigorously gone after pirate sites, and they are right to, in the long term, it probably won’t achieve much. Pirates are very tech-savvy, and with every pirate site that’s taken down, another one comes up in its place. Lawsuits and DMCA notices won’t get us out of this, as Techdirt’s Mansick has argued, “Time and time again, studies and common sense have shown that piracy is the result of a failure in the marketplace to provide what consumers want, in terms of convenience, price and selection.”

In terms of keeping the publishing industry afloat, I have a few ideas outside of playing endless games of whack-a-mole with pirate sites. It’s long past time that we passed a universal basic income (UBI) for every citizen, no one left behind. UBI would allow writers more freedom to pursue their projects while also paying for their basic needs.

On top of that, I would also suggest a permanent Federal Writers Fund (FWF), similar to the Emergency Writers Fund started by PEN America in the wake of coronavirus. Any writer who has published work, or is in contract to publish work, should be eligible to receive between $500 to $1000 a month. There should be some verification to ensure that someone doesn’t self-publish a book that’s one sentence repeated a thousand times, but for the most part, the barriers of entry should be low. Of course, it doesn’t have to completely rely on federal funds, but like PBS, could draw a substantial amount of its money from public donations. Small and indie publishers should also be eligible for such funds, if only to recuperate their losses from piracy.

Publishers also need to wake up to the 21st century and create an e-book for every book in their catalogue. I’m not saying that it’s okay to pirate a book simply because it doesn’t have an e-book yet, or that the e-book format is always preferable, but providing a legal alternative would go a long way.

As far as other strategies go, I like two of the ideas suggested by AG in their 2018 Author Income Survey: 1) allowing authors to negotiate with Amazon, Google, and Facebook to equalize bargaining power 2) Making publishers pay higher royalties on e-books and discounted ones. I don’t know how I feel about making all resellers of books pay royalties, though Amazon probably should pay royalties for in-print used books sold on their site. I also like AG’s idea of pilot program for ECL by Congress, if only to see its effectiveness in bringing orphan works and out-of-print books back into commerce. We also need to foster a greater appreciation of writers in our culture. This means educating the public about the difficulty of writing books as well as the financial struggles that authors often have to endure. It also means increased support for our local libraries.

While the big publishers going after IA want to present themselves as moral crusaders out to defend the honor of their writers, their hands are hardly clean, either. In the first half of 2019, the Association of American Publishers (AAP) found that U.S. publishers made $6 billion in net revenue, which was an increase of 6.9% since 2018. AAP also found that in that same year, the publishing industry grew overall by 2%. That same year, Penguin Random House, one of the publishers in the lawsuit, recorded its highest first half in 12 years.

They say that rising tide lifts all boats, but when the industry’s profits go up and writer’s profits go down, you have to question the extent to which IA is really part of the problem. Again, IA are not innocent actors here, but there are deep-rooted problems with fairness in the publishing industry that IA has little control over. Many bestselling authors, like Phillip Pullman of His Dark Materials, have long complained that publishers are cheating authors out of income. Never mind the fact that diversity in publishing has not improved over the past four years. Given these facts, the lawsuit against IA might be more about moral posturing than moral action.

Looking Ahead

As far as the ongoing lawsuit is concerned, the best case scenario would be a settlement that leads to a sort of compromise between the two sides. IA cannot be allowed to go bankrupt, but going forward, it should work to buy licenses from publishers like most libraries. The publishers need to permit IA to lend out e-books that are orphan works or out-of-print, until they can do the hard work of making these works accessible themselves. All concerned readers should put pressure on all sides to reach this end.

On social media, the debate around IA has unfortunately devolved into name-calling and personal attacks. Thoughtful conversations about trying to rein in the excesses of copyright while ensuring writers get fair pay are few and far between. People are more interested in celebrating the virtuousness of their own side instead of trying to find common ground. Dismissing anyone concerned about losing IA as wanting writers to starve isn’t productive. Harassing and threatening authors who are merely concerned about their livelihoods isn’t helpful, either. Cooler heads and open hearts need to lead the dialogue in navigating these complex issues of reading and writing in the digital age.

The truth about IA is more complicated than either side wants to admit. IA is neither the evil piracy site nor the flawless library that people say it is. IA is an institution of immense cultural value that engages in shady practices. We are, all of us, so terribly passionate on this issue because we are each of us bound by a love for literature, writers, and curation. Instead of using this passion to destructive ends of shaming writers or destroying public services, we must instead use it to recreate the realms of publishing, lending, and preservation for readers and authors everywhere.

Further Reading

--

--

Sansu the Cat
Politics & Discourse

I write about art, life, and humanity. M.A. Japanese Literature. B.A. Spanish & Japanese. email: sansuthecat@yahoo.com