Delving into the state of data journalism

… while continuing to shape Facta

elisabetta tola
Journalism Innovation

--

“Science is not only the vault of stories and data we enjoy talking and reading about, but also a frame of mind that can help journalists improve their approach to finding facts, to evaluating them, to either confirming or discarding them.” That’s the core of what I was discussing in my last post. It is also the core around which I am building Facta, an independent center for the Mediterranean region that applies the scientific method to journalism.

In the weeks of working on concepts, design, budgets and strategy here at the Tow-Knight Center for Entrepreneurial Journalism, I have also been spending lots of time thinking about and articulating why I do not feel satisfied with the state of journalism today.

I know, I know. There are tons of reasons the quality of journalism has been publicly discussed in the last few years. Don’t even try to make me write the buzzwords; I do not want to go there. I think it is clear there are serious issues both with verification and with the ability to put the news in context with high-quality content. Plus, some recent media dramas have resulted in public discussions and given more food for thought — well, at least within those groups of people who care about journalism, whether they work in the industry or they support it. Just look at “The Correspondent” or “The MarkUp” situations, where outstanding media projects have to face public scrutiny for very, very different reasons that might result in serious impact on trust, credibility and even readers’ willingness to support similar future ventures.

Let me be clear: I love journalism. And I have a deep, deep respect for all those journalists who work hard, struggle to bring facts and significant stories to their readers’ attention, keep working even when facing threats, when being mistreated or attacked by those in power, when they become object of hate campaigns, or when they risk or even lose their freedom or lives. On these issues, by the way, great recent readings include a post by Jay Rosen on his PressThink blog, the Italian writer Roberto Saviano’s wakeup call; the latest data on the World Press Freedom Index; and this commentary on Columbia Journalism Review. Therefore, I am not at all willing to be cynical. I am firmly convinced that press freedom is a fundamental value, and that journalism needs to be protected.

But it is precisely because I do love journalism that I am disappointed in many of its practices and routines. As it pertains to Facta, my disappointment rises from very specific reasons, which have to do mainly with how journalism has treated, and is treating, data and valuable information. I was a scientist for an important part of my life before moving into journalism. When I encountered data journalism in its early days, almost 10 years ago, I thought I had found the perfect combination. I could combine my scientific thinking with my journalistic skills to produce a type of journalism that was, above all, useful. But now, a few years down the road, I see that data often end up being used in a decorative way — a nice map or infographic to fill a page, lacking context and giving people little means to go deeper on the issues. To complicate things, media are often the source of a high degree of confusion of facts, hypotheses, theories and opinions, with scarce knowledge of the ways facts and information are validated.

I cannot understand the low commitment and incapability to take complex issues, analyze them with a solid methodology, and offer them to the readers or listeners or viewers in a way that allows them to use that information — to understand complicated problems and issues and come up with potential solutions. Maybe I’m naive, but that’s what I expect journalism to do: offer a high-quality information in the service of democracy; connect, bridge and contextualize.

So I looked up studies and research on the quality of data journalism, a practice that, as I said before, has become more and more widespread in the last 10 years. I started from a hypothesis based on experience and my own perception: with some exceptions, and very important ones, data journalism has generally not fulfilled its original promise to help read reality in a more truthful way. I’d be happy to take into account quite different points of view, should they be supported by evidence. In the meantime, here is what I found.

At the end of March, The Guardian published an informative piece collecting the voices of Caelainn Barr, Mona Chalabi and Nick Evershed, The Guardian’s data editors in the U.K., U.S. and Australia respectively. (By the way, I have met Barr a few times, between the International Journalism Festival in Perugia years ago and the Center for investigative reporting in London. She has an amazing track record in data journalism and is by far one of the best journalists on the task today.)

The Guardian took the opportunity to discuss the state of data journalism exactly on the 10th anniversary of the publication of its Datablog, launched in March 2009 by Simon Rogers, undisputedly one of the pioneers of data journalism, who is now data editor on the News Lab team at Google and director of the Data Journalism Award. In this 10-years-later piece, The Guardian’s data editors say a few things that clarify how data journalism is done today and what has changed with respect to its origin. “The Datablog paved the way for the data projects team but the work we do today is very different.” Barr says. “Over the past decade our approach has evolved and now we amplify the stories we find in data by collaborating with specialist reporters to put human voices at the center of our stories.”

At the beginning, data journalism was quite a trial-and-error process, with lots of inventiveness added to the craft, since it was virgin territory. Evershed says The Guardian is “the one publication that really got me interested in data journalism, as it had a very hacker-punk-DIY approach in the early days. This made me think it was the sort of thing I could do even though I’d had no training in programming or data visualisation beyond the little I’d learned studying science.”

To strengthen that notion of how punk-DIY the approach was, there is this popular post — Anyone can do it. Data journalism is the new punk — written by Simon Rogers back in 2012. It’s almost like a manifesto, and I still use it to introduce students to data journalism. But a fundamental difference compared with those early years is clearly that “in the early days there was more of an emphasis on making the data available,” Chalabi says. “We’d always create a Google spreadsheet with the numbers we had used to write the piece.”

This is not happening anymore, with a few exceptions. And I am really not happy about it, since I have been delving into those data for years and think that actually having the data is what makes data journalism useful beyond the storytelling.

Making the data available was indeed one of the first specific characteristics of data journalism when it took off in 2009, as Simon Rogers tells Letizia Gambini in this interview on the European Journalism Centre’s Medium blog. (By the way, the EJC has just launched a new website entirely devoted to data journalism, with lots of great resources for journalists.) “What if we just published this data in an open data format? No pdfs, just interesting accessible data, ready to use, by anyone. And that’s what we did with the Guardian’s Datablog … We started to realise that data could be applied to everything.”

In his words, Rogers conveys the hype for what was an almost pioneering moment, when data projects were setting the mark for a different type of journalism and great collaborations began between journalists and tech people to improve tools and practices for using the data. The Hacks/Hackers movement also took off in those years.

Rogers also gives his insight into the future. “We face a wider and increasingly alarming issue: Trust. Data analysis has always been subject to interpretation and disagreement, but good data journalism can overcome that. At a time when belief in the news and a shared set of facts are in doubt every day, data journalism can light the way for us, by bringing facts and evidence to light in an accessible way.” And I do agree that good data journalism can overcome distrust, or help in that direction.

But is good data journalism a widespread practice? Let’s see what Alberto Cairo, one of the most active and respected experts in data visualization and a tireless, enthusiastic data viz trainer, has to say about it. Back in 2014, Cairo wrote a post on Nieman Lab titled “Data journalism needs to up its own standards.” He talks about the then-recent buzz around “data” and “explanatory” journalism. “I’m talking about websites like Nate Silver’s FiveThirtyEight and Ezra Klein’s Vox.com,” he says, and also about new operations at traditional media, such as The New York Times’ The Upshot.

“There is a lot to praise in what all those ventures — and others that will appear in the future — are trying to achieve,” Cairo admits, soon adding, “But I have to confess my disappointment with the new wave of data journalism — at least for now.” And he lists good examples of why this is so: examples of cherry-picking and carelessly connecting studies to support an idea; examples of proxy variables used without careful analysis; the tendency to derive long-term linear predictions out of nonlinear phenomena; and some other flaws.

Spurious Correlations project was meant as a fun way to look at correlations — note that the data were all absolutely genuine. Under Creative Commons license via tylervigen.com

Cairo concludes that these new efforts “promised journalism based on a rigorous pondering of facts and data, but they have offered some stories based on flimsy evidence — with a consequent pushback from readers.” He goes on giving advice, such as not working in isolation or in a rush, and collaborating more and more with scientists, who can put the data in context and explain them well. Which is what rigorous media do when working with data, as some of ProPublica’s most outstanding investigations show. (One of my favorites is Losing grounds, which comes from a very careful use of scientific data with lots of help from scientists and experts.) Cairo’s final argument is clear: “There’s a need for a journalism which is more rigorous and scientific.”

While I do agree with every bit of what Alberto Cairo says in this post, I can’t say if his disappointment still stands. Mine does. And a much more recent publication helps me explain why.

In January, Rodrigo Zamith, a media scholar at the University of Massachusetts Amherst, published a piece of research on the journal Digital journalism titled “Transparency, Interactivity, Diversity, and Information Provenance in Everyday Data Journalism.” Zamith analyzed a corpus of 150 data journalism articles produced by The New York Times and The Washington Post in the first half of 2017. His is one of the few studies actually done on the articles; most research in the field done in recent years focuses on interviews with data journalists and media outlets.

On the contrary, Zamith evaluates in a quantitative way “story characteristics linked to the concepts of transparency, interactivity, diversity and information provenance.” These three elements, he says when analyzing previous studies, “have been linked to trust in news media, and as essential components to a response to declines in trust in institutions like journalism within liberal democracies.”

However, his findings show that more than 87% of the articles contained no link to the data. The Washington Post linked to the full set of data used in fewer than 8% of the stories. The New York Times provided only partial links to some data in 6% of the cases.

In regard to interactivity, viewed according to various studies as “a key enabler of participatory transparency as well as an affordance that distinguishes online media from its analog counterparts,” again Zamith found that over 80% of the stories from both outlets offered none.

Finally, in terms of the data used, the major issue is that “notably, scholars have routinely found that journalists rely on publicly accessible data from institutional sources, and especially from governmental sources,” while the use of self-collected data is rare. His findings confirm so little diversification of the sources. And yet, let me add, scientists and academics publish many data that can be accessed and reused. But these rarely get through in the media.

One of the most significant and precise data visualizations of all time — Charles Minard’s 1869 chart showing the successive losses of men in Napoleon’s 1812 Russian campaign army, their movements, as well as the temperature they encountered on the return path. Via Wikimedia, under CC license.

Now, where does all this leave us? With many key questions, such as those posed by Jonathan Gray and Liliana Bounegru in the second edition of the Data Journalism Handbook, to be published this year. In the introduction, Gray and Bounegru reflect on what has changed in the practice of data journalism. They look at how journalists moved from the initial, enthusiastic data experiments to the huge investigations based on massive leaks of the years that followed up to the recent failures to foresee the results of Brexit and the 2016 U.S. presidential election.

Such huge changes bring Gray and Bounegru to say: “Data does not just provide neutral and straightforward representations of the world, but is rather entangled with politics and culture, money and power. Institutions and infrastructures underpinning the production of data — from surveys to statistics, climate science to social media platforms — have been called into question. Thus it might be asked: Which data, whose data and by which means?” They then list other questions but anticipate that the Handbook won’t try to respond. On the contrary, “Instead of treating the relevance and importance of data journalism as an assertion, we treat this as a question which can be addressed in multiple ways.” And we are now curious to see how the various contributors will provide insights and ideas.

I remain convinced that data journalism can and should do better. For me — and maybe I am too idealistic overall — the basic point is that data can be enormously useful, if and when it is used to serve the needs of communities. Which is not so far from what science does: developing knowledge to improve the human condition. Journalism, being it a cornerstone of democracy, should go along with the same principle. And data can play a crucial part. But we need to treat it with respect, rigor and expertise. It should not be merely decorative; it is one of the essential components that allow us to understand the world.

--

--

elisabetta tola
Journalism Innovation

Data, tech&science journalist @formicablu, @radio3scienza @ddjIT @Agenzia_Italia. Founder @facta_ and @towknightcenter #EJ19 fellow