What is Data Journalism?

Ekapon Thienthaworn
8 min readOct 9, 2019

--

Definition

The simple meaning of Data Journalism is telling a story with data, but if we want a specific definition, it has to say that the meaning of Data Journalism is not a unanimous consensus.

However, In order to facilitate the development of knowledge and to promote the use of Data Journalism, I would like to use a definition of the synthesis of indicators. As follows:

Data Journalism is the process of reporting facts using structured data as the core of storytelling and managing it objectively.

There are three key characteristics that could be used as a measure to identify the state of being Data Journalism:

(1) It presents the facts on the basis of objectivity

Data Journalism is one type of Journalism process. It is a subject of fact reporting. Therefore, Data Journalism must be the process of conveying a story as it really is. The facts are conveyed on the basis of objectivity. That is, a belief that truth is a universal law that exists by itself and it does not depend on external context. People who know are separated from truths and are independent of each other. Therefore, interactions between people who know and truths will make the truths impure.

(2) Use data as the core of storytelling.

When done on the basis of objectivity, Data Journalism requires data as the core, without being tied to assumptions, feelings, or opinions of the data sources who are people or even of the journalists themselves. This does not mean that Data Journalism strictly forbids the use of news sources who are people or journalists themselves, but the content in that section must not be the core. It may be an assertion of support from the data processing.

(3) Use a structured data management method

As mentioned, Data Journalism requires data as the core of storytelling, the data here specifically means structured data. However, primarily the raw data used as a news source may be structured data from the beginning, such as data in the form of tables; or maybe in the form of unstructured data, such as photographs, text, video clips, etc. Then we can use Data Science methodology to process it into a structured data set.

Development and Levels of Data Journalism

Although there is a standard definition of Data Journalism as mentioned above, when it comes to the actual use of Data Journalism, there are still a variety of dimensions and overlapping. Consequently, for the sake of clarity, I used the indicators of the starting period, technology, and skills related as criteria for the classification of Data Journalism, from level 1.0 to 3.0 as follows.

(1) Data Journalism 1.0

Oftentimes there are discussions over how long Data Journalism has been around. If we use the broadest definition from the standard definition — that is it is a process of reporting facts that uses structured data as its core — , the beginning of Data Journalism was in the early 1800s, with the introduction of structured data into news coverage, such as news coverage of students’ educational expenses in the cities of Manchester and Salford, presented through the Manchester Guardian.

The skills needed to report this feature were the use of a database and statistics. This Data Journalism 1.0 is still in use today, for example, in Thailand, about ‘the registration of a limited partnership by Pathompol Chan-o-cha’ in a military camp in Thailand — reporting financial statements as having tools worth only 700,000 THB, where Isranews Agency used data management methodology to clarify and compare the current properties and the value of the construction contract, until it found irregularities, but they did not use complex calculation and did not use Data Science.

(2) Data Journalism 2.0

The next level of Data Journalism is more specific. It is the use of computer-aided data reporting for news reporting that uses data as a core. It began around 1950 when there was the emergence of computer-aided technology that enhanced the capacity of human data analysis. The work that could be used as a milestone for this era is the work “Riot in Detroit” by Philip Meyer, which used quantitative research to collect data via questionnaires, then analyzed a lot of statistical data using a computer.

The skills required for this type of reporting are Social Research Methodology and a computer. For example, in Thailand, the work of the “Ghosts of Thailand in ‘The Shock: Explore Thai Ghosts through Thriller Stories’” by The Matter was based on Social Science research to gather content data from The Shock program broadcast from 21–26 August 2018, and analyzed with basic statistics, and processed into infographics.

(3) Data Journalism 3.0

The final level of Data Journalism is the most specific and is the current definition of Data Journalism widely used nowadays in other countries. It is the use of Data Science to help report news that uses data as a core. It began around 2000, where there was the emergence of Data Science which is a process of managing and extracting insights from data. An example of work that can be counted as the beginning point of this Data Journalism is the “Afghan War Logs” by The Guardian , which used Data Science to help gather, organize, analyze, and present hidden stories in military-run war records in Afghanistan, from over ninety-thousand pieces of data, until it received attention from around the world. The skills required for this type of reporting are programming skills and the use of information technology to interact with recipients. In Thailand, for example, the work “Lottery: who gets rich?” by Thai Publica and Boonmee Lab, used Data Science skills to support the work of large and uncategorized data analysis and support the work of creating interactive visualizations, which allow recipients to interact with the content.

It is evident that Data Journalism at all 3 levels is Data Journalism as defined by the standard definition. However, there are differences in the characteristics.

Types of Data Journalism

In addition to the definition and leveling of Data Journalism, classification is one more thing that will make it easier to understand Data Journalism. The key variables used in this study are as follows.

(1) Classification by function

This classification aims to resolve the confusion between the definitions of Data Journalism and investigative news reporting. Although Data Journalism is relevant to investigative news reporting in terms of development and work processes and Data Journalism can also support investigative news reporting to be more effective, Data Journalism is not required for investigative news reporting. The researcher used the characteristic of work as criteria for classifying Data Journalism into two types.

(1.1) Investigative Data Journalism

This type of Data Journalism is the use of Data Journalism to do investigative news reporting. The aim is to expose the peculiarities associated with public interests. It is often used to search for hidden data, and often it takes advanced techniques to analyze complex data. The number of people and hours spent working is more than for general news. For example, the Panama Papers by The International Consortium of Investigative Journalists (ICIJ) disclosed more than 11.5 million secret documents about property holdings and the camouflage of the financial paths of individuals and organizations from many countries around the world, with the cooperation of over 370 journalists from 80 countries over a period of one year.

(1.2) General Data Journalism

This type of Data Journalism is the use of Data Journalism to provide general news coverage. Most do not need advanced techniques or time. An example is the work of “Thai ghosts” in the show “The Shock” by The Matter, which described the attributes of Thai ghosts in horror stories, reflecting thoughts and beliefs in Thai society.

(2) Classification from the target of extracting meaning from the data

This kind of classification focuses on the interpretation of data, which is an important step that is at the heart of Data Journalism. It can be divided into 2 types.

(2.1) Data Journalism that describes characteristics

This type of Data Journalism aims at processing basic data characteristics using statistics or algorithms that are not very complex, such as “America’s Broken Healthcare System” by the Guardian, which compared the use of public health budget with the average life expectancy of the population until it found out the irregularity that lies in the fact that the US has a high public health budget but the average life expectancy is clearly lower than other developed countries.

(2.2) Data Journalism that analyses data relationships

This type of Data Journalism aims at the processing of data links and needs to use statistics or algorithms related to the act of finding data relationships which are more complex than Data Journalism that describes characteristics. For example, “The Rhymes Behind Hamilton” by The Wall Street Journal, which used Clustering to analyze complex phonetic structures, to categorize the relationship of prosody behind the melody and lyrics of hip-hop music.

(3) Classification by presentation format

This approach focuses on data processing for presentation. This is the last step in Data Journalism that will help the recipients understand the data in a very convenient and fast way. It can be divided into two types.

(3.1) Data Journalism in the format of traditional presentation

This type of Data Journalism uses written communication, storytelling, and infographics that are still pictures or animation that is not interactive so there is no need for programming skills and it takes less time than Data Journalism that uses interactive presentations. For example, “How to Reduce Mass Shooting Deaths?” by The New York Times presented data through a matrix graph. Even if the graph was not interactive, it could communicate very well, simply, and clearly.

Matrix Graph in “How to Reduce Mass Shooting Deaths?”

(3.2) Data Journalism in the format of interactives

This type of Data Journalism is communication and storytelling using applications with which recipients can interact, so there is a need to use programming skills to help create an interactive system, such as the work “Lottery” who gets rich? by Thai Publica and Boonmee Lab, which used an interactive game which allowed recipients to interact with the content — pretending to buy a lottery, that is — and interactive charts that allowed recipients to explore data of interest.

Interactive Game Presentation in the work “Lottery” Who gets rich?

--

--

Ekapon Thienthaworn

Lecturer & Researcher. Interested in Data Journalism, Media Literacy, Football, Travel, Cat, Food and Coffee.