Steering Journalism Toward Data Science
How can new guidelines for deeper reporting on AI systems help more journalists investigate our digital world?
Inside an invite-only artificial intelligence conference at Stanford this fall, my ears perked up as a panelist delivered his blithe prediction: “It’s pretty clear that in the next decade, humans will release control to machines.”
We will, I asked myself? And who will cover the consequences?
After two decades working as a national investigative reporter and data journalist, I came to Stanford in September seeking to empower more reporters to investigate artificial intelligence systems.
Once left to the discretion of scientific researchers and software engineers, the vast societal impacts of machine learning on everything from our education system to criminal justice are now being hotly debated worldwide.
And at a time when news consumers are more divided than ever, it’s journalists’ responsibility to help our audiences grasp when an algorithmic model powers scientific breakthroughs, and when it spreads hate or fuels structural discrimination.
But how can beat reporters working in small newsrooms understand complex statistical models in simple terms, identify disparate impacts and explain such sweeping societal change to their local audiences?
As an inaugural Stanford John S. Knight Journalism and Institute for Human-Centered Artificial Intelligence Fellow this year, I’m designing a new set of journalistic standards to provide reporters and editors with best practices for algorithmic accountability reporting. I’m also taking courses in R and deepening my own data science skills to leverage AI and machine learning in my investigative work (more on that, and my openness to collaborate in building or reverse-engineering an algorithm, below).
It’s been moving to see my investigations and data-driven projects over the years for The Associated Press, The Washington Post and The Boston Globe prompt positive government change and spark industry reforms, on everything from immigration to cyber security.
And as one of few computationally trained journalists, it’s clear to me that most newsrooms lack the tools to understand how algorithms work, let alone how to reverse-engineer them to explain their role in guiding key decisions in hiring, banking, law enforcement and medicine.
Consider one of many recent examples: academic researchers revealed that a large hospital in the United States was employing an algorithm that systematically privileged white patients over black patients — and that same algorithm is used in the care of 70 million patients nationwide.
Enabling more reporters to produce better, deeper and more compelling stories about algorithmic systems will have long-range benefits to society that go far beyond improving our daily news coverage.
How we explain and visualize these vast, scalable technologies — and the potential biases of the humans who build them — will have lasting impacts on law, public policy and beyond.
And not a moment too soon, since after my first quarter immersed at Stanford, I see change coming fast from all directions.
In Washington, the U.S. Department of Housing and Urban Development is floating new rules that would essentially prevent banks from being sued should their algorithms result in people of color being disproportionately denied housing.
Here in Silicon Valley, judges are already weighing how to assign responsibility when a driverless car company’s automated system kills its passenger.
Meanwhile, San Francisco supervisors moved to ban the use of facial recognition technology by police, saying that it could deepen racial injustice. And in October, a California law went into effect that replaces cash bail with algorithms to decide which individuals arrested on felony charges should get released before their trial.
Which of these developments is “fair”? And how well do these tools actually work?
This is where the standards I’ll be developing this year will help journalists start to engage in algorithmic accountability reporting, beginning with a few simple steps such as:
— What was the source of the training data used in the algorithm?
— What parameters were used to guide the algorithm and how did it evolve with that data?
— How well did the model perform?
Some of these models are so complex that even those who are writing the code face challenges in describing how, precisely, their work is being implemented by machines.
Ultimately, asking these kinds of questions will help to reveal the assumptions that can be built into different types of algorithms, and how their outputs affect our daily lives. That challenging journalistic work, in turn, will increase transparency and boost understanding of how AI systems make the decisions they do.
So where to start? As computer scientist and digital activist Joy Buolamwini said at the same invite-only conference I attended earlier this quarter, prioritizing “the most marginalized and vulnerable groups when we’re thinking about optimizations” is a good place to begin.
I’m excited to keep working with the many coders, journalists and organizations that are exploring related issues on and off campus and am eager to get coding myself, particularly if you would like to collaborate in building or reverse-engineering an algorithm. If you are interested, involved, or want to offer your feedback please get in touch via the comments below or drop me a line: email@example.com.