I have been doing interview rounds lately (hit me up @SophieWarnes if you have any interesting work) and people are always confused when they come across the term “data journalist”. I think I’ve finally nailed it in a way that explains the breadth of work it might cover, and without overwhelming people with technical jargon. Here goes!
A rough guide to the history of data journalism
Before I explain it, I should probably tell you that data journalism is not as new as you might think. It’s actually got a really interesting history in something known as Computer Assisted Reporting (CAR). CAR is fairly self explanatory; it’s journalism that’s carried out using various computing tools. It started in the 50s, and became more popular in the 80s as technology improved. Really, CAR was about analysing data using database software and spotting patterns. It helped reporters in the US expose things like discrepancies in crime rates, and all sorts of fun things.
Data journalism — more accurately, data-driven journalism (DDJ) — is kind of a CAR 2.0 or 3.0, taking it to a new level. But it’s a phrase that was only coined in 2009, so it’s no wonder people keep asking me what it means!
Data journalism as a process/workflow
Data journalism, in my mind, can be split into several sub-sections. Actually, it might be a lot easier to think of it as a process or series of steps:
- Gathering the data and putting it into a format which is usable
It might be that you get handed a clean spreadsheet as the basis for your work, but you’d be very lucky if this is the case. Usually, you need to find the data online somewhere and scrape it so you can use it. Then you need to clean it. In some cases, the data is separate and not in a digital format, for instance, receipts. This might mean typing up and organising all physical receipts…
- Analysing the data and finding the story
Once you have everything you need, you need to find out what the story in the data is. Has something gone up or down drastically, that you need to look into in more detail; is there something standing out that begs for another look? Does it point you in the direction of something else? This step will either give you the story or more ideas to look into.
- Visualising the data you’ve found, to explain the story
This just means creating a chart or some sort of visualisation to explain what’s going on. It can be published alongside the report, or it can be published as a standalone — for instance, a lot of people working with data now just publish the charts on Twitter as a tease for the whole story they’ve written. It makes sense when people have so little time, to break it down into something visual that can grab them and explain what’s going on in a few seconds.
Thing is, it’s not necessary for all of these steps to be taken. You could be given a clean spreadsheet and get brilliant story out of it. There’s no need for visualisation; this is just personal preference. And equally you could just publish a visualisation that’s open for people to explore, rather than publishing a written story. This is why data journalism is actually much more flexible than it might sound at first!
Another thing to note is that usually* people will be really good at one bit of the process and stick to that. In a collaborative data journalism team, you might have a programmer who is good at extracting information, someone who writes up the story, and someone who is a graphic designer or visualisation journalist who will create relevant charts to go with it.
*Not all the time, though. In my experience, it’s been more a case of having ownership over one story and doing every step myself or perhaps one or two bits in conjunction with others. I’m not sure whether this is better or worse than creating a team; on the one hand it means I have experience in all three, but on the other it means I’d personally probably want to follow each stage, and I haven’t had the opportunity to see how this would work!