Key trends for data journalism in 2019: machine learning, collaborations, and code art
We recently gathered experts from around the world to talk about the state of data journalism and what challenges they see data teams facing in 2019. This article rounds up their insights.
2018 has been a great year in terms of innovation and large scale collaborations for data journalists worldwide.
Machine learning, sensors, automation and new data sources are becoming more popular. We’ve gauged the state of the field, thanks to our experts:
- Simon Rogers (Google, US),
- Reginald Chua (Reuters, US),
- Yudivian Almeida Cruz (Postdata.club, Cuba),
- Kuek Ser Kuang Keng (Data Journalism Awards competition officer, Malaysia),
- Cheryl Phillips (Stanford University, US),
- Giannina Segnini (Columbia University, US).
All agreed to say that data teams around the world have outdone themselves this past year. “There’s been loads of interesting work, from the Paradise Papers to Cambridge Analytica to some nice satellite imagery work on Myanmar, and so on,” Reginald Chua (Reuters, US) argues.
Data journalism is a growing field
The first conclusion our experts came to was that the field of data journalism has progressed. Simon Rogers, data editor at Google and director of the Data Journalism Awards competition (deadline: 7 April 2019), argues we’ve definitely come a long way:
“It’s like we’re not playing around anymore. There was a phase where there were a lot of ‘let’s answer this unimportant question but WITH DATA’. I think that’s over now — there’s just too much going on in the world. I feel like it’s finally become mainstream and widespread across the globe.”
The top innovation for this year? It definitely was the use of machine learning techniques, which has become more frequent in many countries.
Other innovative storytelling techniques have emerged this year. Automation, or the art of using robots to facilitate large scale projects, is one of them. Giannina Segnini (Columbia University, US) sees automation and new data sources becoming more popular.
Here is a great example by Bayerischer Rundfunk and SPIEGEL in Germany:
Hanna and Ismail are new to the city and have a lot in common: They're both in their late twenties, single, working in…www.hanna-und-ismail.de
To prove the existence of discrimination in the German rental housing market, data journalists sent more than 20,000 applications to approximately 7,000 apartment advertisements in an automated process and evaluated the received responses. The result makes an impressive piece of data-driven journalism and you can read in detail how they constructed their investigation and calculated the results in this article.
Collaborative data projects are on the rise
The second thing our experts observed was that there are more and more large-scale collaborations going on. That it doesn’t just apply to western countries, but also involves teams in Asia, South America, and Africa.
Of course there was the Implant Files, by the ICIJ, in partnership with 250 journalists in 36 countries, which investigated the harm caused by medical devices that have been tested inadequately or not at all.
There also was the West Africa Leaks, investigating how Africa’s elite hide billions offshore.
The Organized Crime and Corruption Reporting Project (OCCRP) carries on doing incredible work across Europe, Africa, Asia, the Middle East and Latin America.
Also in the US, the Big Local News project — part of the Stanford Journalism and Democracy Initiative — aims to collect, process and share governmental data that’s difficult to obtain and analyze. The initiative, great example of collaboration within the data journalism industry, will partner with local and national newsrooms to use this data to examine a wide range of issues including criminal justice, housing, health and education for accountability journalism.
If you’re keen on sharing ideas about collaborative data journalism projects, and discuss with experts on the topic, come and take part in our free Slack discussion on 25 January 2019 at 9AM Pacific Time.
Data journalism, a field still growing worldwide. The examples of Taiwan and Cuba.
For those who still thinks that data journalism is a thing of the West, here is something to prove you wrong.
Kuek Ser Kuang Keng (Data Journalism Awards competition officer, Malaysia) shouted out to journalists in Taiwan, where data was used almost systematically during their recent midterm elections. Almost all online news websites have used election data and maps to enhance their reporting and analysis.
This article compiles several data-driven reporting from different outlets (it’s in Chinese, but you can use that Google Translate Chrome extension to turn it to English): How do the Taiwanese media play the 2018 local elections? by Hacks/Hackers Tapei
The use of data by news teams is also growing in Cuba, Yudivian Almeida Cruz (Postdata.club, Cuba) taught us. “Data liberation is on the way,” he said.
“It’s interesting [because] we are working on a new constitution and many different media used a data-driven approach to cover this process.
We are more or less [experiencing] data liberation, people are more interested in data, and have better access to the internet. People from the government are having more presence in social networks.”
Challenges for data journalists in 2019
We’ve asked our experts to name three challenges they think data journalists worldwide will have to face in 2019. Here is what they said…
Turning unstructured information into structured data is still a problem
With the growing amount of data readily available these days (though unfortunately not always in the right format), and the great effort from journalists to collect large sets of data, comes the notion of structured and unstructured information.
If you find it hard to differentiate the two, here is a great explainer by Brandon Wolfe:
“Structured data is easily searchable by basic algorithms. Examples include spreadsheets and data from machine sensors.
Unstructured data is more like human language. It doesn’t fit nicely into relational databases like SQL, and searching it based on the old algorithms ranges from difficult to completely impossible.”
“Collecting and turning unstructured information into structured data is a big challenge,” Cheryl Phillips (Stanford University, US) said. “That’s because the tools are not readily available in the newsroom yet.”
Hopefully, 2019 will be the year this problem gets fixed. In the meantime, Philips encourages newsrooms to continue their hard work collecting and normalizing disparate data, with help from collaborative initiatives such as the Big Local News project we mentioned above.
Yudivian Almeida Cruz (Postdata.club, Cuba) argued that deep learning will have an important role to play in this challenge. It could be used to help data analysis and get insights more easily out of the mess that is unstructured information.
If you’re into machine learning and looking for deep learning frameworks to play with, go check out this article by James Le: The 5 Deep Learning Frameworks Every Serious Machine Learner Should Be Familiar With
Access to government data in some countries is still limited
The second challenge our experts identified is a struggle in many countries, “even those with supposedly strong open records law,” Cheryl Phillips (Stanford University, US) said.
Access to government data was a problem in 2018, and will still be one in 2019.
“In Southeast Asia, journalists in some countries are having a hard time, especially the Philippines and Myanmar,” said Kuek Ser Kuang Keng (Data Journalism Awards competition officer, Malaysia).
“Access to government information and assessing the integrity of the information (fake or misleading data and information) are still a challenge there. But we also see huge progress in Malaysia where the new government is drafting its FOI law and reviewing its open data policy (for the better), so data journalism has a huge opportunity to grow there.”
In China, instead of getting data from the government, journalists turn to tech companies: “Some of the big tech companies are willing to share,” Kuek Ser Kuang Keng said.
“For example, instead of getting traffic data from the government, [journalists] got similar data from Didi, the equivalent of Uber in China (check out the Gaiga Initiative). Waze (a popular traffic navigation mobile app owned by Google) is also sharing traffic data with media in some countries here.”
Local data journalism doesn’t develop equally in different parts of the world
We’ve seen the great expand of The Bureau Local initiative in the UK, a collaborative, investigative network launched by the Bureau of Investigative Journalism, comprising 833 members, which resulted in 293 stories so far.
Local data journalism in China is also a thing, Kuek Ser Kuang Keng explained: “Local data in China is sometimes easier to obtain compared to national data. In some big highly urbanised cities like Shanghai, the local authority has higher willingness to work with journalists on data sharing. Of course sensitive issues are still behind the line.”
But other places like Cuba, it’s still a challenge. Yudivian Almeida Cruz (Postdata.club, Cuba) said journalists find it hard “to cover the local news based on data, because it’s most common to have [national] or states [data]. It’s difficult to get data for local places.”
What technologies to use in 2019
Finally, we asked our experts what new ways of telling stories with data they were keen on playing with this year.
While Google recently announced its Fusion Tables will soon be gone (I invite you to read Simon Rogers’ Twitter thread about this), new tools will come to newsrooms in 2019.
Here are the top three tech for this year, highlighted by Simon Rogers from Google:
“I just think it’s time for some new approaches to visual storytelling, which can be a hard issue,” he said. “We worked with Datavized on something of an experiment [called Morph].”
Generative art, also called “code art”, is any art that is built using code. You can get an introduction to it in this article by Ali Spittel.
There are endless examples on CodePen — for example CSS art.
“Already some great storytellers like Nadieh Bremer are doing it,” Simon Rogers added. “I just want to see if there’s a way to make it accessible for everyone.”
To end this post in beauty, here is an example of data-driven generative art by Bremer:
You can find the entire discussion this article is based on via the Data Journalism Awards Slack team.
The Data Journalism Awards 2019 competition is organised by the Global Editors Network, supported by the Google News Initiative, the John S. and James L. Knight Foundation, and Microsoft. Today, it’s the biggest international competition recognising outstanding work in the field of data journalism worldwide. Entries are now open and data journalists worldwide have until 7 April 2019 to apply.