Aaron Swartz at a Boston Wikipedia Meetup, 2009.

Public Knowledge for the Public Good: Working Toward Digital Access in the Spirit of Aaron Swartz

By John Wihbey
Assistant Director for
Journalist’s Resource at the Shorenstein Center on Media, Politics and Public Policy

In January 2013, after a brief but celebrated career as a programmer, technologist and Internet activist, Aaron Swartz committed suicide at age 26 on the eve of his trial in federal court. A year later I wrote about two ideas that guided his life’s work, academic open access and open government. With the second anniversary of his death having just passed, I’ve again been pondering questions and issues he raised during his short life.

Let’s take stock of just a few of the information “bottlenecks” that have yet to be resolved:

  • Data about and basic formulas for life-saving drugs are protected by intellectual property rights, effectively walling them off from the world’s poor.
  • The world’s scientific studies remain largely locked away in pricey journals. This has deeply personal costs for individuals: For example, patients with acute, long-term illnesses who want to inform themselves about their diseases cannot get access. This is true despite NIH’s admirable efforts to make studies more open.
  • Thinly resourced nonprofits working in the public interest, such as development-aid groups, are unable to access research that could inform their work.
  • Books that formerly could be lent easily as hard copies through libraries are effectively no longer available to patrons in the form many want them, as full digital versions are locked down by publishing houses who fear piracy.
  • Government-funded studies that could fuel innovation and economic growth are often kept private.
  • The U.S. government’s most accessible research summaries on public-policy issues, produced by the Congressional Research Service, are not made public as a matter of policy. Nor are the intelligence community’s open-source reports on foreign affairs, the result of a relatively recent policy reversal.
  • Basic information about national security institutions and related government issues — not just sensitive or controversial secrets — is withheld from the public because of a culture of secrecy and over-classification.
  • At the federal, state and local level, essential knowledge and information remains poorly organized and barely visible.
  • Market-moving information about companies is available only upon request from the government, giving the advantage to sophisticated investors who know what to ask for (and how to exploit the Freedom of Information Act, which itself remains riddled with problems).

Despite the power of the Internet to “liberate” information, it is clear that many problems will take public policy solutions and a lot of hard work. On the political side, this means grassroots campaigns, such as the one that led to the White House’s open-access directive for publicly funded research. On the practical side, it means individuals contributing to Wikipedia and other knowledge platforms; hackathons related to data and openness; and the creation of sites for publishing documents and curating knowledge. Above all, it means adding things to the Web, organizing them, and fighting for them to be open, visible and searchable.

There will not be, one fine day, a single massive download that solves all access problems, lowers costs and maximizes knowledge to improve society.

The Human Knowledge Project

There are three main areas where access to knowledge remains contested: data and public information policy; academic research; books and culture. In a digital era, some of the core goals of journalism, civic activism, academia, libraries and archives have all collapsed into one another.

As Swartz’s story has continued to resonate with me, I’ve been talking with some people involved in related pursuits and keeping tabs on a few projects related to knowledge infrastructure.

Technology and copyright scholar John Palfrey helped found the Digital Public Library of America (DPLA), a hugely important effort to network databases and collections across the country and advocate for openness of materials. Now the chair of its board, he worries about “possible enclosure” as more people want e-books and online materials but digital rights management (DRM) policies and copyright increasingly inhibit access.

“In a general sense, the Internet era has resulted in more knowledge around the world at lower cost, and those are all good things,” Palfrey, author of the forthcoming BiblioTech: Why Libraries Matter More than Ever in the Age of Google, told me. “I think we’re at a moment, though, where we could go in a number of different directions. That trend could continue, or it’s possible depending on how the marketplace plays out, that some things could get more locked up than ever before.”

In The Googlization of Everything: And Why We Should Worry, University of Virginia professor Siva Vaidhyanathan has called for a grand “human knowledge project.” He writes that this will involve confronting impediments directly, changing minds and writing new laws: “We can’t just hope that some big, rich company will do it for us.”

Knowledge Workers in the Trenches

Perhaps information wants to be free. But the real question is: Who will make it free?

There are many types of knowledge “worker bees” who are buzzing away in the trenches. Aaron Swartz provides the model of the creative, if edgy, hacker. But Steven Aftergood, a kind of civil servant for government open access, is the professional, quotidian mirror image of Swartz. Aftergood runs the Secrecy News, part of the Project on Government Secrecy at the Federation of American Scientists (FAS), a group that was founded in 1945 by members of the Manhattan Project. For decades, FAS was led by Jeremy Stone, son of the legendary investigative journalist I.F. Stone.

For 20 years, Aftergood has vacuumed up and published tens of thousands of pages of classified documents in the national security sphere. His daily email provides links to new documents hosted on FAS’s site as well as syntheses of the high points. Aftergood also surfaces a vast range of Congressional Research Service reports and documents — paid for with public dollars, but notoriously difficult to access — that can provide insight on all manner of policy issues. Developing good sources is key to this work, he says, but the sifting, organizing and educating is equally important.

“I think that I may be, deep down, not a spy but a librarian,” Aftergood told me. “I get a kind of satisfaction from collecting and assembling information and presenting it in an intelligible way. And more than that, I have a kind of an old-fashioned belief in an informed public in making our country work. I think we’re better off having thoughtful disagreements than any possible alternative.”

We can find Aftergood’s counterparts in other areas of digital life, from the Wikipedian army to Brewster Kahle at the Internet Archive, from Carl Malamud and his Public.Resource.org projects to academic open access expert Peter Suber, who has helped chart the course at Harvard for making more research available online to the public.

Then there’s the Sunlight Foundation. Like many other open-data organizations, it works to obtain, organize and crunch useful government data, providing tools to other organizations and in effect helping to subsidize a broad range of public-transparency work. The world of APIs — application programming interfaces — opens up the possibility of seeing data streams as a form of public infrastructure. “The data that we make available through our API clearly provides a strong underpinning for the field at large, making Sunlight key to the entire infrastructure of the entire field of people interested in open data,” Sunlight cofounder Ellen Miller told me.

The folks who run Open Secrets are in the same camp, as is MuckRock, the freedom of information project. And there are invaluable curators, too, such as Gary Price’s and Shirl Kennedy’s Full Text Reports and numerous digital academic projects in the sciences, social sciences and digital humanities, both old and new — for example, the National Security Archive at George Washington University; the Supreme Court legal archive Oyez, now hosted at Illinois Tech; and Northeastern University’s NULab for Texts, Maps and Networks.

To their credit, the Obama White House and its teams within OSTP have pushed hard in this direction, with initiatives such as Project Open Data. So have some state governments, counties and cities.

An increasingly capable set of news media players have also been building knowledge infrastructure, not only through accountability reporting and posting data and documents but also through the creation of news apps for the public good. Examples include ProPublica’s Dollars for Docs, the Texas Tribune’s government salary database, NPR’s Shale Play and the New York TimesGuantanamo Docket. All serve the cause of knowledge access and the public interest.

Necessary but Not Sufficient

Of course, all of the knowledge in the world is of little consequence unless you have the proper tools, rights and skills. MIT’s Eric von Hippel, a leading scholar in the field of entrepreneurship and consumer innovation and author of Democratizing Innovation, argues that a blind belief in the power of information access overlooks key factors: The United States currently needs legal reform in order to “protect the right to innovate.” The ability to modify existing products and designs, or to build new things, can quickly run into patent issues and IP problems.

“Opening up data and open access — whether the government does it, or universities do it — is certainly a good thing,” von Hippel said to me in an interview. “What that does is that it gives individuals and groups the power to do something… But there are different components to empowerment. What you need in a broad range of [innovation] experiments is all of those rights — and access.”

Still, having that access — and continuing to build a movement and workforce to make universal knowledge a reality — is a necessary step.

Michael Morisy, a journalist who founded MuckRock, has pioneered an effort to request, post and organize hundreds of thousands of pages of government documents, making the FOIA process easier. These documents have a long-term value for the public, he says: “I’m always surprised when something that we published two or three years ago will suddenly become the most-viewed document on the site, because new information came out that made it particularly interesting.”

In the fall of 2010, Swartz became MuckRock user number 70. “Working with Aaron, what was delightful was just how creative he was in exploring resources, particularly public resources,” Morisy told me. “It’s something that we want to encourage.” In a blog post at MuckRock, “Aaron Swartz 1986–2013,” Morisy wrote:

For many of these requests, Aaron would literally wait years for a response, only to receive cryptic rejections or no response at all. For every time he raised the ire of federal investigators, and ultimately prosecutors, he had, indeed, taken the proscribed path dozens or hundreds of times. If Aaron’s methods and aims in freeing information were “radical,” then they were reactions to deep-rooted, systematic failures that often demanded radical responses.

Radical tactics, yes. But it is through building systematic knowledge solutions that, in the long run, we will produce the most radical change.

John Wihbey is Assistant Director for Journalist’s Resource at the Shorenstein Center on Media, Politics and Public Policy, at the Harvard Kennedy School. He writes a regular column for Nieman Journalism Lab and is a lecturer in journalism and multimedia at Boston University. Twitter: @JournoResource @wihbey.