What makes a winning Data Story?

By Bahareh Heravi & Adegboyega Ojo
Nordic Data Journalism Conference, January 2017

Photo credit: clement127

Data storytelling is rapidly gaining prominence as a characteristic activity of digital journalism, with significant adoption by small and large media houses. In the past few years, there has been an increased attention on the qualitative aspects of data stories; specifically on how such qualitative factors impact journalism and how data journalism could be improved. At the same time, there is a growing stock of knowledge on exemplars of exceptional and good data storytelling from a journalistic viewpoint. For instance, the Global Editors Network (GEN), through its Data Journalism Award, have been identifying exceptional data journalism practices since 2012.

While a few studies in the past have examined single aspects of data storytelling, such as narratives, visualisation or analysis (Segel and Heer 2010; Lee et al. 2015; Alexander and Vetere 2011, Stikeleather 2013), we believe there is a lack of systematic research around the characteristics of good data stories, as well as the technologies and tools employed in such stories.

A literature review in the Data Journalism domain by Julian Ausserhofer et al. (2016) shows that there are few studies in this area focusing on theory or methodology. We are yet to see a systematic effort to gain better insights into the characterises of good data stories, how such stories are created, and what skills are required in creating such stories.

This paper aims to address this gap in systematic research and practice by studying the winners in the Global Editors Network’s annual Data Journalism Awards, and providing a framework to characterise successful data storytelling. Through analysing the winning stories, this paper provides a systematic insight into the combination of tools and techniques which enable excellence in data journalism.

The framework developed in this paper provides a systematic analysis of the practical aspects of data journalism, studying all data storytelling cases recognised as the outstanding by Global Editors Network (GEN 2016) — in other words the winners of the Data Journalism Award, from 2013 through 2016. Using a multi-case approach (Baxter and Jack 2008), this study uniformly characterises each of the 44 winning cases between 2013 and 2016, and proceeds to determine purpose, genre, representation styles and genres of these stories, and the nature of technologies employed in creating these stories.

The resulting knowledge-base is then analysed using a method called Formal Concept Analysis, to determine the major genres of data storytelling, and further to identify the types and combinatory breakdown of technological tools required to develop the award-winning stories.

Our findings refine the traditional typology of data stories from the journalistic perspective and also recommend technical competencies for the future data journalist and teams working in newsrooms.

Where are the winning cases coming from?

There were 44 winning cases between 2013 and 2016, originating from 14 different countries. United States dominates 46% of GEN Data Journalism awardees. There were five cases from United Kingdom, four from France and Argentina and two from Peru. Other countries with at least 1 winning entry includes Switzerland, Spain, Italy, India, Hong Kong, Germany, Denmark, Costa Rica and Canada. North and South America, Europe and Asia are represented in the GEN award map, while the Africa, Australia, Russia are unrepresented. The case of Africa and conflict regions is interesting as these regions are story-rich and in fact some of the settings of the winning data stories are centred on these regions.

Purpose of Story

The goal of a story could be to inform, to explain, to persuade or to entertain. Many of the winning cases had more than one goal, for instance a story may aim at informing the public and simultaneously persuading or entertaining (Slaney, 2012). Specifically speaking, about 73% of the cases had as part of their goals to “inform” the target audience (e.g. that linking metadata information about citizens’ call records to email, bank data, etc. is sufficient to reveal the thought and living patterns of subjects) while about 41% of the stories was also interested in “persuading” the audience towards adopting some positions (e.g. persuading parents who are doubtful about vaccination that there vaccination programs works. About 39% of cases tried “explaining” some phenomena to the public, for instance how millions of voters that are disproportionately minorities could inadvertently be prevented from voting based on the rules used by a computer program designed to identify irregularities during election. Some of the reviewed cases were collections of different works (18%) and consequently had a combination of different goals. The overall picture shows that even when the agenda of a data story is to persuade or explain; good practice may require informing the audience about the context and background of the subject matter.

Interactivity and Representation style

The level of interactivity employed in telling a data story directly affects the story experience. Our analysis shows 59% of the reviewed data story cases were “interactive” while 27% of the stories provided features for searching, filtering and selection. Only 7% of the cases employed map-based interactivity. Static images and graphics were used 14% of the cases.

Most (about 77%) of the interactive features were rendered through annotated graphics and maps. Between 10% and 18% of the cases used videos, web application, games as media for interactive data storytelling. The set of representation styles here has taken ideas from a variety of sources, as well as additional categories added by the authors based on the analysis itself. The backbone of these categories comes from the seven Genres of narrative visualisation by Segel and Heer (2010).

Tools and Technologies

Looking into the categories of tools and technologies employed in these winning stories, we have found certain types of tools and technologies to be prominent between the winners. Technologies for producing data visualisation appears to be the most prominent of all tools and technologies. This category excludes map visualisations, which if merged, would be by far the most popular set of tools and technologies between winning data stories. The tools and technologies categorised under data visualisation include Tableau Public, Javascript when specifically notes as a means for data visualisation, D3.js and Highcharts. Data visualisation is closely followed by tools or technologies falling under our web development and publishing category. These include web-base programming languages such as Javascript, HTML CSS, Python when used for web development and PHP.

Data analysis tools and techniques are the third runner ups when it comes to categories of tools and technologies used in the winning data stories. Example of these are Excel, SPSS, R and Pandas.

Map visualisation and databases come after the first three, while we see other categories such as Data Wrangling, Data Scraping and general programming lower in the list.

In terms of use of specific tools and technologies, newsrooms show a significantly versatile and colourful utilisation of tools, techniques and programming languages.

Various Google tools demonstrate themselves as popular in a variety of categories in production of the winning data stories. These tools are anything from Google Spreadsheet, Docs and Drive, to Fusion Tables, and Google Currency Convertor API. Javascript and HTML appear most frequently as Web development tools, Microsoft Excel and Python appear to be the most frequently used tools in the data analysis and analytics category, D3.js is the most common data visualisation framework employed across the cases while MySQL standout as the most popular database tools. Other notably popular tools include jQuery (Scripting library), Open Refine (data preparation and refinement) and Adobe Illustrator (Graphics publishing). Together over 130 different tools and frameworks were employed across the 44 winning cases.

Conclusion

Newsrooms across the world and the journalism community have seen a tremendous shift in the ways in which data and algorithms are used in journalistic practices. From simple representation of information, to complicated data-driven investigations and newsroom tool development, we have seen an ever growing use of data, algorithm and computational tools in newsrooms in the past few years.

The wide range of tools and technologies used in data journalism and data storytelling, and particularly in high quality work, such as the GEN Data Journalism Awards winners, show that there is a no lack of tools to choose from when it comes to data storytelling. Some of these tools and technologies need more technical expertise, such as programming languages, while we see a wide range of tools used in these winning data stories which do not need a high degree of technical and computational skills, and could be used with many journalists.

Data has opened up many opportunities for newsrooms, and computational methods and tools have made it possible for newsrooms to take advantage of these new source and produce tremendously high quality journalistic work. The road to this point has been a learning process for the community and has been an exciting one. The road ahead is even more exciting.

If you are interested in contributing in studying this field further, consider taking part in the 2017 Global Data Journalism survey. The results of this study will be made available to the community, and the more participants the study gets, the better we can study the best practices in the field, and the future needs and opportunities.

References

Alexander, S., and Vetere, C. 2011. “Telling the data story the right way”, Healthcare Financial Managemnt (October), pp. 104–110.

Baxter, P., and Jack, S. 2008. “Qualitative Case Study Methodology: Study Design and Implementation for Novice Researchers,” The Qualitative Report Volume (13:4), pp. 544–559 (doi: 10.2174/1874434600802010058).

Ausserhofer, J., Gutounig, R. Oppermann, M.’ Matiasek, S.’ Goldgruber, E.2016, “Research on data journalism: What is there to investigate? Insights from a structured literature review”, Nordic Data Journalism Conference (NODA16) Academic Pre-Conference, Finland.

GEN. 2016. “Global Editors Network — About US,” GEN Homepage (available at http://www.globaleditorsnetwork.org/about-us/; retrieved May 7, 2016).

Lee, B., Riche, N. H., Isenberg, P., and Carpendale, S. 2015. “More Than Telling a Story: Transforming Data into Visually Shared Stories,” IEEE Computer Graphics and Applications (35:5), pp. 84–90 (doi: 10.1109/MCG.2015.99).

Segel, E., and Heer, J. 2010. “Narrative visualization: Telling stories with data,” IEEE Transactions on Visualization and Computer Graphics (16:6), pp. 1139–1148 (doi: 10.1109/TVCG.2010.179).

Slaney, M. 2012. “Tell Me a Story,” IEEE Computer, pp. 4–6.

Stikeleather, J. 2013. “The Three Elements of Succesful Data Visualizations,” Havard Business Review (available at https://hbr.org/2013/04/the-three-elements-of-successf/; retrieved May 6, 2016).

Bahareh Heravi is an Assistant Professor at the School of Information and Communication Studies, University College Dublin.
Adeboyega Ojo is a Senior Research Fellow & E-Government Unit Leader at Insight Centre for Data Analytics, National University of Ireland Galway.