FOIA is your API
Automating public records requests can change investigative journalism
Five years ago, I tried to automate my way out of a job.
I was working as a software engineer at the time, taking a kind of sabbatical from journalism. A friend at an investigative newsroom in Chicago asked if I wanted a side gig filing 500 state Freedom of Information Act requests from local governments across Illinois.
Sounded pretty ambitious. Then the thought crossed my mind: Could a computer program manage a large-scale FOIA campaign?
The idea seemed kinda sci-fi at first. But I realized all the pieces were there: public records laws allowing for email requests, open-source software to generate and organize FOIA letters, lists of public information officers to contact. Just write a form letter, I figured, plus some code to pull it all together.
In a technical sense, it helps to think about the computing concept known as an application programming interface. APIs allow one application to request information from another.
From a legal standpoint, local governments have a similar function. Think about it: the federal Freedom of Information Act, and various state-level versions of the law, serve as protocols for public agencies to respond to records requests.
FOIA is an API, I realized.
It works. I started out by hacking FOIAMachine to file batch requests. Later, I built FOIAmail to handle mass-scale public records campaigns. These frameworks have helped collect more than 4,000 responses to date through a combination of automated requests and manual follow-ups.
When I built FOIAmail in 2017, the Better Government Association used it to request payroll records from 929 local governments across Illinois. The results beat my own expectations. By the legal response deadline, 82 percent of agencies had responded in some way.
FOIAmail uses Gmail and automation to help journalists FOIA faster. It generates messages automatically using the Gmail API, then labels incoming mail as the agencies respond. An auto-updating status report keeps track of each agency, its status and a log of correspondence. When a journalist verifies that a response is complete, the email attachments from that agency copy over to a shared file directory.
By design, the system requires manual checks to verify a response is legit. It’s a light touch but it helps everyone sleep at night, knowing things have been checked out by human eyes. Automation only gets you so far. The best newsroom tools allow journalists to do their jobs better and faster, and to spend more of their time following their own instincts and verifying information coming in — whether it was collected by a person or a program.
So that’s where we stand. FOIA automation is a thing now. This has led me to my next question: How can we use this concept to serve the public and advance journalism?
An opportunity to innovate
This is a complicated subject at the intersection of law, technology and journalism. Fortunately, I’m in a good position to explore this space.
This academic year, I am a John S. Knight Journalism fellow at Stanford University, where I’m prototyping systems to monitor public records that are critical to the public interest using FOIA automation and other tools.
It helps that I’ve spent time in the trenches of local investigative journalism, doing the important but repetitive work of requesting and processing public records and analyzing and reporting the results.
Four years ago, I was hired as the first data editor of The Chicago Reporter, a small but fierce non-profit with a legacy of investigating issues of race and income inequality. I’ve learned a thing or two about public records laws and how to take a data-driven approach to reporting on civil and human rights. I know firsthand the limits of automation — and where technology can help journalists stay organized while they do the inevitable manual work.
Automating FOIA and data workflows will only work if it meets the needs of reporters and editors who are out there doing work in the wild
So much of investigative journalism happens at the local level. The most crucial records — covering topics like criminal justice, economic development, education, elections, housing, parks and recreation, policing, taxation and transportation, to name just a few — can be found scattered throughout municipal offices across the country.
According to the U.S. Census Bureau, there are more than 89,000 local governments in America. Each one varies in size, but taken together, the scale is massive. These agencies may report summary statistics to state and federal authorities, but the stories are in the details. And the paper trail often leads back to a city hall or county courthouse.
This presents a dilemma for journalism. Local governments maintain records that hold a high value to the public interest. But those records come at a high cost in terms of the effort required to request, process and analyze various data and documents.
“The desire of government to keep some data secret means there may be significant transaction costs involved in pursuing a story, such as the hassle costs associated with using the Freedom of Information Act to extract documents from the government,” Stanford Communications Department Chair Jay Hamilton explains in his book, Democracy’s Detectives.
But there’s hope: “Computational journalism … can lower the costs of discovering watchdog stories, and make it easier (and more profitable) to tell stories in personalized and engaging ways.”
There’s an opportunity for automation to illuminate big-picture trends that may otherwise go unnoticed. What if we could identify the most important records kept at the local level, then collect, process and analyze them in an efficient and systematic way? How would that change the game of investigative journalism?
For instance, can we use these kinds of tools to keep tabs on official misconduct records at police departments across an entire state, or multiple states? Could we build an apparatus to identify voting rights violations in the run up to elections? When we publish data-driven investigations, can we set up continuous monitoring to diagnose whether the situation improves over time, the way a doctor might review a patient’s medical chart?
Let’s take a step back, though, and be realistic about things. Journalism doesn’t need tools so much as it needs solutions. To that end, automating FOIA and data workflows will only work if it meets the needs of reporters and editors who are out there doing work in the wild, so to speak.
As an editor who has worked many research-intensive projects, I know that it all comes down to getting the story and getting it right. Technical tools should support operations, not distract from them. FOIAmail, for example, is designed to do one thing: collect the same type of records from many different agencies, with minimal setup and maintenance. It’s important to keep things simple and to the point.
At Stanford, JSK Fellows start the year with a crash course in human-centered design. This includes mapping out the problem space to ensure we’re focusing on the most important challenges facing journalism, conducting field interviews and qualitative research, and laddering to explore all the needs of stakeholders. This training, influenced by the university’s d.school, helps fellows avoid the trap of solving a problem without consulting with those affected. This is a well-known problem in design thinking, analogized by the phrase “great landing, wrong airport.”
To that end, I’m consulting with news organizations to design new approaches to large-scale data- and document-driven investigations. I also want to address challenges around processing and interpreting data, including designing an automation pipeline that can help investigative journalism scale.
During the winter quarter, I’m going to research how newsrooms manage and execute large-scale investigations based on public records. I’m interested in learning about pain points and success stories, including technical details and the practical side of things. If you’re interested in discussing this concept, drop me a line: firstname.lastname@example.org.