Why the Library of Congress Should Digitize the Nation’s Books

Jennifer Howard’s insightful recent EdSurge piece about the current state of the Google Books project has inspired this call for the Library of Congress to assume leadership of the book scanning project that Google initiated.

But first let’s give credit its due. Google has made amazing strides in this project, most importantly by scanning at least 25 million books from various university libraries. No other entity would have gotten nearly this far. Google took it very seriously, right down to building custom book scanners from scratch. And as Howard’s piece points out, the Ngram Viewer for tracking changes in word usage over time is an amazing and well-used tool. The Viewer is only as rich as it is because of that underlying corpus of text.

And yet, there was always something unsettling about entrusting such a mission to Google. Google is a for-profit search company, first and foremost, and not a cultural heritage institution. Money counts most of all. This is why it always would have been better for the Library of Congress (LC) to lead the effort to build the comprehensive digital library envisioned by Google. There was no risk that LC would then want to turn around and sell books after it had scanned them. Any such sales in this case could have purely benefited authors. But Google was raring to go on this project, as far back as 2002 when Google co-founder Larry Page approached his alma mater (Michigan) to discuss his idea of scanning their library’s books. The idea of mustering such resources and will within LC seemed quite far-fetched, and so throughout the 2000’s I reluctantly supported Google’s taking the lead in book digitization.

All of those long-ago discussions were mostly theoretical and philosophical. After more than a decade of Google Books, it is a good time to take practical stock of where the effort stands. Jennifer Howard’s piece, along with a long read by James Somers in the Atlantic earlier this year, argues that to some extent Google’s efforts have now stalled. Somers argues that the project is essentially dead, while Howard says that the “work continues” but at much less full throttle than before. Either way, we are still a very long way from having every book in the world digitized and available to read online. This was the maximal version of Google’s vision.The truth is that only the Library of Congress (working with other national libraries) has the heft and clarity of mission to make it happen. To make that case, it’s important to note a few key milestones in the journey of Google Books.

  1. Google proved its fair use claim. Google Books works by showing snippets of text in response to a search query, not the full text of a book. Yes, the entire book is scanned but the searcher does not get to read it online. The original vision was for Google to route readers to libraries or online bookstores, where they could obtain or buy the book. This struck me as a goldmine for authors and publishers, which is why it was so puzzling that both groups sued Google for copyright infringement. The library owned the books it was giving to Google for scanning, through the legal doctrine of the right of first sale. And Google’s efforts upheld copyright rather than violating it, as those snippets were small and the entire book could never be read. Besides that, routing to bookstores could have led to new sales for otherwise obscure books. This seemed like a classic instance of biting the hand that feeds you. Google made a strong fair use claim under the US copyright law, which allows for access to copyrighted works under controlled conditions. The company prevailed in every court case, as it should have. Scanning and digitization of printed books are now explicitly fair use efforts under US law.
  2. Google is a for-profit company first and foremost. Eventually authors and publishers realized that Google Books could be good for them. They wanted to stop biting the hand that feeds them, so they worked out a settlement with Google. This settlement would have created a new “Book Rights Registry,” an escrow account that would have paid copyright holders who came forward to assert their copyright over a book that had been scanned. (One huge logistical problem is that copyright holders, especially for older books, are very hard to locate). This settlement would have also set up a new, huge marketplace in digitized books — with profits from sales of in-print books going to authors and sale of out-of-print books going to Google. This would have included both direct sales of books, as well as of licensing a collection of out-of-print materials to libraries and universities for a fee. Here was what everyone had always feared: that Google would build a new profit center using out-of-print books, which is exactly what a library would not have done. This is why many people, including Robert Darnton of Harvard and Pamela Samuelson of Berkeley, spoke out against the terms of the settlement. The objections kept coming, and Judge Denny Chin ultimately rejected the settlement between the authors and Google. Judge Chin pointed out that the terms of the settlement were far more expansive than they needed to be to satisfy the original complaint from authors, which was also the position of the US Department of Justice. But as Somers notes in the Atlantic, he was likely also swayed by the frequent criticisms of the proposed settlement. A prospective settlement is supposed to resolve tensions and balance interests, and that did not happen here. As Howard points out in EdSurge, Google “wasn’t ultimately able to resolve a persistent cultural challenge: how to balance copyright and fair use and keep everybody — authors, publishers, scholars, librarians — satisfied. That work still lies ahead.”
  3. The Library of Congress should now claim the torch. When Google Books began LC was led by James Billington — a brilliant scholar, but something of a Luddite. He was appointed in the Reagan administration, and did not prove to be the right man to bring LC into the Internet age. LC’s current leader, visionary librarian Carla Hayden, is exactly that leader. Hayden sees the transformative potential of the Web in improving access to information. She is in the right place at the right time to take the digitization mantle from Google, and to help solve the thorny challenges that still remain. I hope that LC seizes this moment.