In 10 years, data journalism has gone from a niche reporting exercise to becoming a key part of newsrooms all over the world. To find out how data journalism has changed in the last decade, we talked with Simon Rogers, the founder of Guardian Datablog that published its first dataset in 2009. This is what he told us about his journey from London to Silicon Valley, where he is now data editor at Google.
How did you become involved with data journalism?
When I decided I wanted to be a journalist, somewhere between the first and second years of primary school, it never occurred to me that would involve data. Now, working with data every day, I realise how lucky I was. It certainly was not the result of carefully-calibrated career plans. I was just in the right place at the right time.
The way it happened says a lot about the state of data journalism in 2009. I believe it also tells us a lot about data journalism in 2019.
Adrian Holovaty, a developer from Chicago who had worked at the Washington Post and started Everyblock, came to give a talk to the newsroom in the then Guardian education centre on Farringdon Road in London. At that time I was a news editor at the print paper (then the centre of gravity), having worked online and edited a science section.
The more Holovaty spoke about using data to both tell stories and help people understand the world, the more something triggered in me. Not only could I be doing this, but it actually reflected what I was doing more and more. Maybe I could be a journalist who worked with data. A “Data Journalist”.
What was your first data journalism project?
Working as a news editor with the graphics desk, I had accumulated a lot of numbers: Matt McAlister, who was launching the Guardian’s open API, described it as “the motherlode”. We had GDP data, carbon emissions, government spending data and much more cleaned up, all saved as Google spreadsheets and ready for use the next time we needed it.
What if we just published this data in an open data format? No pdfs, just interesting accessible data, ready to use, by anyone. And that’s what we did with the Guardian’s Datablog — at first with 200 distinct datasets: crime rates, economic indicators, war zone details and even fashion week and Doctor Who villains. We started to realise that data could be applied to everything.
It was still a weird thing to be doing. “Data editor” was hardly a widespread job — very few newsrooms had any kind of data team at all.
In fact, just using the word ‘data’ in a news meeting would elicit sniggers. This wasn’t “proper” journalism, right?
And when did the things start to change?
2009 was the start of the open data revolution: US government data hub data.gov had been launched in May of that year with just 47 datasets. Open data portals were being launched by countries and cities all over the world, and campaigners were demanding access to ever more.
Within a year, we had our readers helping to crowdsource thousands of MPs expenses and the UK government had released its ultimate spending dataset: COINS, the Combined Online Information System, and the Guardian team had built an interactive explorer to encourage readers to help explore both.
A year later, the Guardian’s investigative team was grappling with a massive release of US military records from Iraq and Afghanistan, and data journalism was well-established as a field in the newsroom. And by the end of 2011 — the year before the Data Journalism Handbook was first published — the Reading the Riots project had applied the computer-assisted reporting techniques of Phil Meyer in the 1960s to an outbreak of violence across England.
What do these projects tell about the evolution of data journalism?
The point is not to list projects but to highlight what was going on in those few years, not just at The Guardian but in newsrooms around the world. The New York Times, the LA Times, La Nación in Argentina… everywhere, journalists were discovering a new way to work by telling data-led stories in innovative ways. This was the background to the first edition Data Journalism Handbook.
Data journalism went from being the province of a few loners to an established part of every newsroom.
But one trend became clear even then: whenever a new technique is introduced in reporting, data would not only be a key part of it but data journalists would be right there in the middle of it.
In a period of less than three years, journalists found data; published datasets; crowdsourcing became an established newsroom tool; journalists used databases to manage huge document dumps; and applied data-driven social science techniques to complex news stories.
This should not be seen as an isolated development within the field of journalism. These were just the effects of huge developments in international transparency beyond the setting up of open data portals.
Can you name some examples of these developments?
These included campaigns such as those run by the Open Knowledge Foundation to increase the pressure on the UK government to open up news datasets for public use and provide APIs for anyone to explore. They also included increased access to powerful free data visualisation and cleaning tools, such as Open Refine, Google Fusion Tables, Many Eyes (from IBM), Datawrapper, Tableau Public and more.
Those free tools combined with access to a lot of free public data facilitated the production of more and more public-facing visualisations and data analysis that was popular. Newsrooms such as the Texas Tribune and ProPublica, started to build operations around this data.
Can you see how this works? A virtuous circle of data, easy processing, data visualisation more data, and so on. The more data is out there, the more work is done with the data, the greater pressure there is for more data to be released.
When I wrote the piece “Data Journalism is the New Punk” it was making that point: we were at a place where creativity could really run free. But also where the work would eventually become mainstream.
What does it mean for data journalism to become mainstream?
When I got the chance to move to California and join Twitter as its first data editor it was clear that data had entered the vocabulary of mainstream publishing. A number of data journalism sites sprouted within weeks of each other, such as the New York Times’ Upshot and Nate Silver’s FiveThirtyEight.
Audiences out there in the world were becoming more and more visually literate and could understand sophisticated visualisations of complex topics much more readily than before. You will ask what evidence I have that the world is comfortable with data visualisations? I don’t have a lot beyond my experience that producing a visual which garners a big reaction online is harder than it used to be. Where we all used to react with “oohs and ahs” to visuals, now it’s harder to get beyond a shrug.
By the time I joined the Google News Lab to work on data journalism, it had become clear that the field has access to greater and larger datasets than ever before.
Every day, there are billions of searches, for instance, a significant proportion of which have never been seen before. And increasingly reporters are taking that data, and analysing it, along with tweets and Facebook likes. This is the exhaust of modern life, turned around and given back to us as insights about the way we live today.
It is also more widespread than it has ever been. In 2016, the Data Journalism Awards received a record 471 entries. But the 2018 awards received nearer 700, over half from small newsrooms, and many from across the world.
And those entries are becoming more and more innovative. Artificial Intelligence, or machine learning, has become a tool for data journalism as evidenced by Peter Aldhous’ work in Buzzfeed. Meanwhile, access to new technologies like Virtual and Augmented Reality can let designers showcase data in news and more interesting ways by making a story real and visceral to the public.
As someone whose job is to imagine how data journalism could change — and what we can do to support it — that means that now as much as working to tell data stories, we’re creating projects to make using those new technologies easier for more reporters — for instance, we worked recently with design studio Datavized to build TwoTone, a visual tool to run data into sound.
What are the challenges you foresee in the future?
The challenges are great, for all of us. We all consume information in increasingly mobile ways, which brings its own challenges. The days of fullscreen complex visualisations have crashed against the fact that more than half of us now read the news on our phones or other mobile devices (a third of us read the news on the toilet, according to the Reuters news consumption study 2017). That means that increasingly newsroom designers have to design for tiny screens and dwindling attention spans.
We also have a new problem that can stop us learning from the past. Code dies, libraries rot and eventually much of the most ambitious work in journalism just dies. MPs Expenses, Everyblock and other projects have all succumbed to a vanishing institutional memory.
And we face a wider and increasingly alarming issue: Trust. Data analysis has always been subject to interpretation and disagreement, but good data journalism can overcome that. At a time when belief in the news and a shared set of facts are in doubt every day, data journalism can light the way for us, by bringing facts and evidence to light in an accessible way.
So, despite all the change, some things are constant in this field. Data journalism has a long history, but in 2009, data journalism seemed a way to get at a common truth, something we could all get behind. Now that need is greater than ever before.
Recently we launched a new platform for data journalism enthusiasts: DataJournalism.com. How do you see it fits in the 2019 data journalism scene?
It’s very easy for data journalists to feel isolated and, frankly, lonely in what they do every day. Often they are working on their own, trying to deal with complicated issues that no-one else in the newsroom understands. This site will help those just starting out and everyone from the sole practitioner to the member of a huge team feel like they have some support.
Just a place to see what’s gone before and how and when it should be emulated will make such a huge difference to the field. It’s easy to forget that you’re not the first person to try something, but that there’s a whole network out there just waiting to help. That’s what this site does — it gives us a home.
This article anticipates some of the themes that will be mentioned in Simon’s chapter for the Data Journalism Handbook (due in 2019).