What big data leaks tell us about the future of journalism — and its past

This article was first published in the Internet Policy Review

The Panama Papers have been described as “biggest leak in the history of data journalism” with approximately 11.5 million documents provided by an anonymous source to the German newspaper Süddeutsche Zeitung. For all the things that made the Panama Papers exceptional, they also represent the latest step in a development that roughly started in 2010, when WikiLeaks cooperated with The New York Times, The Guardian and Der Spiegel to publish the Afghan war logs. Compared to the Panama Papers, this leak was tiny: a spreadsheet with 92.201 rows describing military events (Rogers, 2011). However, its release initiated a pattern that has been replicated in almost every major leak that followed. When we compare the Afghan war logs with the Panama Papers, one of the most interesting aspects is not what has changed, but what has not changed. With the exception of the Snowden leaks (where things worked out slightly differently), every major leak since the Afghan war logs included:

  1. An anonymous source providing the leak using encrypted channels.
  2. An independent organisation (then WikiLeaks, now the International Consortium of Investigative Journalists — ICIJ) which acts as a mediator that gives exclusive access to the leak to selected newsrooms in different countries and facilitates their collaboration.
  3. Subsequently a cross-national collaboration with each newspaper covering aspects interesting for their national audiences, combined with a simultaneous, international release date of the coverage to ensure greater (international) impact.
  4. Because the leaks are relatively huge, newsrooms are challenged to employ and advance data journalism techniques to be able to analyse the data, filter out relevant aspects, and tell stories.

When, in 2010, WikiLeaks was at its ‘peak’ with the release of the war logs and the ‘Cablegate’, there were lots of debates about its relationship with journalism. Is or was WikiLeaks a journalistic institution, or merely a ‘source’ for traditional journalists? I suggest the Panama Papers have demonstrated that Beckett was right when he pointed out that this debate was “really a debate about what journalism is or is becoming. Instead of asking whether WikiLeaks is journalism or not, we should ask ‘what kind of journalism is WikiLeaks creating?’ The challenge to the rest of journalism is to come up with something as good if not better” (2011, emphasis added). In many ways, the work and discourse around the Panama Papers read as just that: an attempt not to copy WikiLeaks, but to adapt the practices listed above while simultaneously maintaining and expanding long-standing journalistic practices and identities.

‘Normalising’ leaking

More obviously, news media have adopted technologies similar to what WikiLeaks has provided. This is best illustrated by SecureDrop, a whistleblower submission system developed by the Freedom of the Press Foundation. More generally, there is growing awareness of online security and an adoption of encryption tools among journalists. A few years ago, Glenn Greenwald (2014) almost missed one of the most significant leaks in history because he didn’t want to bother with PGP email encryption. During the Panama Papers investigations, a wide range of different tools have been used to secure the whistleblower, the leak itself and the ongoing investigation — on a scale that was hard to imagine just five years ago. Moreover, WikiLeaks’ role as a mediator that organises access and collaborations has been taken up by the ICIJ, a journalistic organisation that has been around much longer than WikiLeaks. Different to WikiLeaks, it did not just provide (exclusive) access to the leak, but also developed tools and platforms that help journalists cooperate on a much larger scale. Beyond the adoption and development of new technologies, journalists have also integrated leaking into their traditional working routines and ethics. This has been most visible in the debate over the release of raw documents. From the beginning, transparency advocates — and WikiLeaks in particular — were disappointed because Süddeutsche Zeitung and the ICIJ refused to release the ‘raw’, unedited documents in full, so that others would be able to carry-out their own investigation. Doing so, they argued, would breach the law and be unethical:​

We are not going to release the raw data and we have valid reasons to do so. The source decided to give the data to journalists and not, f.e., to Wikileaks. As journalists, we have to protect our source…And as responsible journalists we also stick to certain ethical rules: You dont [sic] harm the privacy of people, who are not in the public eye. (Source)

Note the contrast between WikiLeaks and the radical type of transparency it stands for (Bodó, 2014) versus ‘responsible journalists’ who only publish what is in the public interest. As the ICIJ director Gerard Ryle told Wired: “We’re not WikiLeaks. We’re trying to show that journalism can be done responsibly” (quoted in Greenberg, 2016). In an attempt to re-establish the authority of professional journalism, news media are trying to move the concept of leaking away from radical Anonymous-style transparency advocacy and into traditional journalistic working routines and ethics.

When we look at how journalists used to react to other potentially ‘disruptive’ technologies like blogging or user-generated content in the past, this rhetoric is hardly surprising. Journalists have a tendency to ‘absorb’ practices that threaten to undermine their professional autonomy “into conventional hierarchies of newsgathering” (Wahl-Jorgensen 2014, p. 2588), a routine that has been described as ‘normalization’ (Singer, 2005). Rather than seeing such practices as opportunities to fundamentally rethink journalism and the way news is ‘made’, journalists tend to rationalise them in a way that maintains their traditional role as gatekeepers of publicly relevant information. The classic example is blogging, which originally appeared as a threat to journalism’s role as a gatekeeper. Today, it is common for news media and individual journalists to have their own blogs.

Alternatives to Wikileaks

The Panama Papers demonstrate how much news media have normalised leaking since WikiLeaks ‘disrupted’ journalism in 2010. To be clear, normalising leaking doesn’t mean that huge leaks have become ‘normal’, but that the way journalists deal with and rationalise those leaks has been ‘routinised’ and fit into their professional identity. As Beckett has pointed out, to a large extent this meant coming up ‘with something as good if not better’ than WikiLeaks, i.e. becoming a viable alternative, that whistleblowers can turn to for sharing their leaks. By not releasing the unedited documents, journalists of the Süddeutsche Zeitung and the ICIJ emphasised that they deal with leaks ‘responsibly’. The message to potential whistleblowers: you can share your leaks with journalists to expose wrongdoings without causing harm. Moreover, leaking to journalists instead of WikiLeaks promises a large-scale impact because news media are still best at reaching large audiences, an asset that is further reinforced by their cross-national collaboration. In other words, journalists are not trying to replace WikiLeaks, but to contrast it with their own version of leaking that builds on journalistic traditions and maintains their professional autonomy. While advocates for a more radical transparency will keep questioning journalists’ authority to decide what is in the public interest, this type of boundary work and professionalism among journalists does have positive effects for the public as it helps to strengthen journalists’ collective identity, lends autonomy and authority against the influence of governments or corporations, and emphasises journalism’s role as a public service over commercial interests (Lewis, 2012, p. 844). Both ‘genres’ of leaking — the radical transparency model that promises maximum disclosure and the journalistic one that maintains journalism’s gatekeeper role and promises a more considerate publication and large-scale impact — are likely to coexist in the future to compete for the trust of whistleblowers.

What kind of journalism has been created?

The fact that leaking has been normalised doesn’t mean it didn’t change anything. Quite the contrary happened: normalisation is change, but not the radical and ‘disruptive’ type of change that is popular when it comes to new technologies. The changes caused by normalising potentially threatening practices are more subtle, less obvious and relatively slow. The Panama Papers have exemplified how leaking supported two developments that increasingly shape investigative journalism: advances in data journalism and automation and a culture of collaboration and sharing.

Automation, collaboration and the identity of journalism

As I’ve argued elsewhere (Baack, 2013), the coverage of WikiLeaks’ materials was an important push to establish data journalism in newsrooms. Not only are data journalism techniques necessary to cope with the leaks in the first place, but the simultaneous release date of the coverage of the war logs and the Cablegate also internationally demonstrated the advantages of utilising these techniques and establishing dedicated ‘data teams’ in newsrooms. As Simon Rogers — a data journalist at The Guardian at that time, commented: “Wikileaks didn’t invent data journalism. But it did give newsrooms a reason to adopt it” (2011). The Panama Papers also required new technological advances, not only because the leak was so huge, but because of the data it contained. Data journalism usually deals with ‘structured data’, i.e. quantitative spreadsheet data that can be analysed using statistical tools and methods. But only a tiny fraction of the Panama Papers was structured. The vast majority was unstructured data in the form of emails or scanned documents that required a more in-depth, qualitative analysis.

To work with this type of data, it was key to be able to filter out those documents that contained relevant information. As Mar Cabra (2016), head of the Data & Research Unit at the ICIJ explains, searching was key in many ways: being able to search through the data at all, being able to use more complex search queries, and being able to search systematically and in batches. For example, journalists could create a spreadsheet containing the names of all the politicians in Germany, upload it, and the platform provided by the ICIJ would return the results. Moreover, collaboration and sharing among newsrooms was essential and took place on a much larger scale than ever before with around 400 journalists involved. Even with advanced search capabilities, a single newsrooms would only manage to examine a tiny fraction of the leak, and a newspaper in Germany would search and cover different topics than newspapers in other parts of the world. The ICIJ therefore built a customised social network for investigative journalists, with a news feed similar to that of Facebook, where journalists from around the world could share their findings with others, supporting a collaborative spirit. Asking what kind of journalism has been ‘created’ by leaking, I suggest — more significant than the technical details — is how these practices represent a change in the mentalities and everyday working routines of journalists. Here, I want to point out two aspects that have at least in part been evolving due to big data leaks.

First, the collaborative spirit and a culture of sharing is a clear break from the traditional mentality of investigative journalists: ‘lone wolves’ competing for scoops and unwilling to share with others. This is at least partly due to the interconnections between leaking and data journalism: while leaking helped to establish data journalism in newsrooms, data journalism helped to bring a mentality of sharing and collaboration to investigative journalism (Royal, 2010; Lewis and Usher, 2013; Parasie and Dagiral, 2013). This mentality is taken from open source culture and increasingly shapes not just exceptionally large and significant leaks but also everyday reporting. Here again, these practices have been adopted in ways that maintain journalists’ traditional gatekeeper role, which is to say that they are mainly used to facilitate collaborations and exchange among fellow journalists. However, there are signs of change that indicate “that journalism’s ideological commitment to control, rooted in an institutional instinct toward protecting legitimacy and boundaries, may be giving way to a hybrid logic of adaptability and openness: a willingness to see audiences on a more peer level, to appreciate their contributions, and to find normative purpose in transparency and participation” (Lewis, 2012, p. 851).

Second, journalists are getting better at dealing with unstructured documents and use automation on a larger scale. The basic idea of using computers and statistical methods to support journalism is not new: Philip Meyer (2002) first articulated it in the 1970s, long before these practices were called ‘data journalism’ (Anderson, 2015). However, the expansion of the internet has greatly increased the scope by which these techniques can be performed. The investigations around the Panama Papers demonstrate that there is much potential when using automation in newsrooms, even though the vision of Adrian Holovaty (2006) has still not been fully realised. He argued that news media should systematically collect, analyse and re-purpose their data to “supplement, routinize, or algorithmically expand the scope” of their traditional journalistic practices (Anderson, 2013, p. 1008). This is not only significant because it could expand the agency of journalists. Automation and computational technologies also increasingly “become objects of discourse through which organized fields such as journalism reflexively make sense of their particular capacities and place in the world” (Bucher, 2016, p. 13). Ascribing meaning to the computational is also “about ascribing meaning to journalism by way of talking about what computation can and cannot do” (Bucher, 2016, p. 13), raising questions “striking at the core of how journalism should be understood” (Carlson, 2015, p. 429). Here again, we can see patterns of normalisation, as journalists tend to either point out that the essence of journalism cannot be automated (like human instinct) or rationalise computation as journalism, arguing that the design and development of computational tools should uphold traditional values like objectivity or impartiality (Stavelin, 2013).

This brief look at how leaking changed journalism should make us skeptical of grand narratives proclaiming the ‘disruption’ of journalism. In the foreseeable future, journalism will in many ways look very similar to what it looked like in the past. It will operate on larger scales and journalists will be forced to re-articulate their professional identity and role, but if the history of normalisation can teach us a lesson, they will do so in ways that preserve traditional journalistic values, practices, ethics and the role as a gatekeeper of publicly relevant information.

References

Anderson, C. W. 2013. “Towards a Sociology of Computational and Algorithmic Journalism.” New Media & Society, 15(7): 1005–21. doi:10.1177/1461444812465137.

Anderson, C. W. 2015. “Between the Unique and the Pattern.” Digital Journalism, 3(3): 349–63. doi:10.1080/21670811.2014.976407.

Baack, Stefan. 2013. “A New Style of News Reporting. Wikileaks and Data-Driven Journalism.” In Cyborg Subjects: Discourses on Digital Culture, edited by Bonni Rambatan and Jacob Johanssen, 113–22. Shoestring Anthologies. [S.l.]: CreateSpace Independent Publishing Platform. http://nbn-resolving.de/urn:nbn:de:0168-ssoar-400253.

Beckett, Charlie. 2011. “WikiLeaks As Journalism.” Polis. http://blogs.lse.ac.uk/polis/2011/06/25/wikileaks-as-journalism-2/.

Bodó, Balázs. 2014. “Hacktivism 1–2–3: How Privacy Enhancing Technologies Change the Face of Anonymous Hacktivism.” doi:10.14763/2014.4.340.

Bucher, Taina. 2016. “‘Machines Don’t Have Instincts’: Articulating the Computational in Journalism.” New Media & Society, January. doi:10.1177/1461444815624182.

Cabra, Mar, and Erin Kissane. 2016. “The People and Tech Behind the Panama Papers.” Source. https://source.opennews.org/en-US/articles/people-and-tech-behind-panama-papers/.

Carlson, Matt. 2015. “The Robotic Reporter.” Digital Journalism, 3(3): 416–31. doi:10.1080/21670811.2014.976412.

Greenberg, Andy. 2016. “How Reporters Pulled Off the Panama Papers, the Biggest Leak in Whistleblower History.” WIRED. https://www.wired.com/2016/04/reporters-pulled-off-panama-papers-biggest-leak-whistleblower-history/.

Greenwald, Glenn. 2014. No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State. First Edition. New York, NY: Metropolitan Books/Henry Holt.

Holovaty, Adrian. 2006. “A Fundamental Way Newspaper Sites Need to Change.” http://www.holovaty.com/writing/fundamental-change/.

Lewis, Seth C. 2012. “The Tension Between Professional Control and Open Participation. Journalism and Its Boundaries.” Information, Communication & Society, 15(6): 836–66. Doi: 10.1080/1369118X.2012.674150

Lewis, Seth C., and Nikki Usher. 2013. “Open Source and Journalism: Toward New Frameworks for Imagining News Innovation.” Media Culture Society, 35(5): 602–19. doi:10.1177/0163443713485494.

Meyer, Philip. 2002. Precision Journalism: A Reporter’s Introduction to Social Science Methods. Lanham, Md.: Rowman & Littlefield Publishers.

Parasie, Sylvain, and Eric Dagiral. 2013. “Data-Driven Journalism and the Public Good: ‘Computer-Assisted-Reporters’ and ‘Programmer-Journalists’ in Chicago.” New Media & Society, 15(6): 853–71. doi:10.1177/1461444812463345.

Rogers, Simon. 2011. “Wikileaks Data Journalism: How We Handled the Data.” The Guardian. http://www.theguardian.com/news/datablog/2011/jan/31/wikileaks-data-journalism.

Royal, Cindy. 2010. “The Journalist as Programmer: A Case Study of the New York Times Interactive News Technology Department.” In The International Symposium in Online Journalism. The University of Texas.

Singer, Jane B. 2005. “The Political J-Blogger: ‘Normalizing’ a New Media Form to Fit Old Norms and Practices.” Journalism, 6(2): 173–98. doi:10.1177/1464884905051009.

Stavelin, Eirik. 2013. “Computational Journalism. When Journalism Meets Programming.” Doctoral thesis, University of Bergen. http://hdl.handle.net/1956/7926.

Wahl-Jorgensen, Karin. 2014. “Is WikiLeaks Challenging the Paradigm of Journalism? Boundary Work and Beyond.” International Journal of Communication, 8: 2581–92. http://ijoc.org/index.php/ijoc/article/view/2771.