Five years of the Trinity Mirror Data Unit

David Ottewell, head of data journalism at Trinity Mirror

Trinity Mirror’s Data Unit has been an integral part of the company’s move to being digital first, creating content now used on more than 50 websites a week. It’s headed up by David Ottewell, a strong advocate of the difference local journalism can make in a digital world. Here, he takes us behind the scenes of his team which began as just two journalists back in 2013:

“If you’re a digital-first team, why do you always start presentations with images of your stories on the front pages of newspapers?

It was a fair question, and I’m not sure I had a good answer for it when it was put to me last week. I think I do now.

It isn’t just that newspapers still matter a great deal (although they do) or that a great front page can still change the world (although it can). Front pages are also the best visual shorthand I have for explaining what the data unit aims to achieve.

To put a story on the front page is for an editor to say: I think readers will want to read this, and I think readers should want to read this. This is the most important story, in this region, on this day.

I don’t read every single Trinity title every day of every week, so I don’t know how many splashes we’ve had over the years. I know I have a collection of around 500 from the daily titles alone, so that’s at least 100 a year.

That’s 100 times a year an editor looks at our story and thinks: that is the most important story, in this region, on this day. And when they think that, the fact it is a “data journalism story” is neither here nor there.

Some of theTM Data Unit’s splashes from 2017

The Trinity Mirror Data Unit started in 2013. There were just two of us — me, based in Manchester, and Claire Miller, based in Cardiff. Our remit was to do interesting things with data; to build a case for local data journalism.

Both of us were journalists first and data journalists second. I moved over from the newsdesk of the Manchester Evening News, where I’d previously been political editor and chief reporter. Strictly speaking, I wasn’t a “data journalist” at all; but I was someone who believed in the power of local data journalism to find stories, build interactives, and help connect readers to important information about their lives.

Over the years — thanks to a hugely supportive parent company, brilliant and dedicated additions, and the body of work we’ve built up together — we’ve grown to a team of 12 including coders, graphic designers, and a videographer. All work full-time for the data unit: writing stories, producing graphical pages, creating interactives and tools, and pursuing data investigations and projects.

We set our own news agenda, which requires our journalists to have a constant stream of story ideas to pitch at our daily news conferences. We focus on producing exclusives, and write bespoke copy for any and every title where a strong line emerges from the data. We contexualise the story, speak to experts, get comments and reaction, find case studies. We get a lot of our data from Freedom of Information requests, scraping, or combining existing datasets in new, and newsworthy, ways.

We monitor the swathe of spreadsheets that are churned out by the government, the ONS and other public bodies every week, planning ahead to look at those (and only those) which contain local data for us to get our teeth into, and which would otherwise go largely unnoticed.

There is no shortage of data being published in the UK. The problem isn’t that information is being deliberately withheld from the public, but that there are (still) too few journalists and members of the public with the skills to analyse the huge amount of data that is available. The result? Public interest stories — scandals, often — hiding in plain sight.

Sample of a daily “planned output” email sent to people on the TM mailing list

Once we’ve written our stories — and built interactive and video content around it, if appropriate — we send it out using an intelligent mailing list. This means the 500 or so members of the company signed up for our content only receive stuff that is tailored to their geographic area and topics of interest (education, health, crime, and so on). They also get a “weekly review”, with the intro of every story we’ve written for them that week, and a link back to the full version. The idea is that we work in collaboration with local titles: in some cases, newsdesks will simply use our content “as is”; in others, they will want to built it up using local contacts and knowledge.

Example of a “weekly review” email sent to recipients of our news (not sport) output in Manchester

Five years is a long time in journalism. Five years ago, “data journalism” was the buzzword at innovation conferences and in newsrooms across the country, across the world. Everyone was planning to do more data journalism. It was even said by some in the industry that “soon, all journalism will be data journalism in one form or another” — a statement I, the head of the country’s biggest data journalism team, find meaningless and terrifying in equal measure.

Whatever the problems facing journalism were, the answer was data journalism. Even if no one was quite sure what that meant.

Well, no. Data journalism is a wonderful thing, a great tool for finding stories, presenting stories, and allowing readers to explore information about their lives and the places where they live. But it isn’t now, and wasn’t then, a panacea.

It’s just as hard to find a great story through data journalism as it is through any other form of journalism. Harder, in many cases: it’s much easier for a reader to connect with a human story than it is for them to connect with numbers. Doing data journalism well requires great news judgement; ruthlessness in selecting what is and isn’t worth analysing; and sufficient storytelling skill to turn complex numbers into a compelling intro or visualisation. (For the record, that’s a lot of storytelling skill.)

Doing data journalism well means putting the requirements of the readers you actually have first, rather than impressing your data journalism peers. It means caring about your readers’ interests, and presenting things in ways which they will respond to, and share.

It also means accepting the same demands as every other journalist, in terms of the quality and quantity of your output.

There are numbers some data journalists are reluctant to talk about. How many stories do you actually write, compared to non-data journalists? How well do they do, in terms of unique users and engagement? How long do they take to produce? Is that a worthwhile investment of time, of resources, of money? And if not, what precisely is the argument for data journalism, other than that “it’s the future”?

Five years is a long time in journalism. Data journalism is not an innovation any more. It is not a child, to be nurtured and spoilt. It needs to be delivering what the industry — which basically means the readers — truly want. And it needs to be doing it day after day after day.

So what does the Trinity Mirror Data Unit do that is any different? Well, in terms of numbers, we turn an average of just over 100 datasets a month into around 1,500 fully-formed local stories. And not just any stories — exclusive stories, stories that wouldn’t be told without us. We don’t look at datasets which we know our titles will be analysing anyway. We aim to give them something different, something new, something local.

“Who Am I” interactive

Then there are the visualisations, the explainer videos. There are the long-form, deep-dive investigations into issues like gender and the appalling life chances of children in care.

Video explainer as part of our Children in Care project (Liverpool version)
Sample front page from Children in Care investigtion

There are the interactive gadgets we build to tell you everything from the kind of neighbourhood you live in (that one led the MirrorOnline most-read list for a good while), to levels of deprivation in your local area (so did that), to how good your local doctor’s surgery is (that led lots of titles’ most-read lists), to whether your football team is going to get relegated (yep, that one too).

There are the tools we make to provide automated live results across the group on election (or referendum) night, or to allow our titles to make mobile-friendly “pick the team” gadgets like this in a matter of minutes, complete with all the right colours and badges and Twitter-sharing capability.

Example of “pick the team” gadget generator

Then there are the set piece projects, like the World War One commemoration we did in partnership with the Commonwealth War Graves Commission, and which (through a mammoth act of scraping and data cleaning) allowed us to not only name the war dead from every town in Britain, but find the youngest, oldest, first to die, last to die, and the days when most soldiers died. At the last time of counting, the search gadget we made as part of the project has been used by more than three million people.

WW1 search gadget

Our latest innovation has been daily graphic-led print pages that are published in titles across the group, each one exploring or investigating a different issue and aiming to get readers to look at it in a different light.

Example print page: the missing children

Finally, we provide an automated half-page service for weekly titles. Editors tell us which postcodes, local authorities, health trusts, and so on that they cover. We use this information to extract the latest data on a range of topics including house prices, vehicle crime, local hospital performance and child health. We feed this through a graphical generator created by a designer and coder, and the result is a bespoke, print-ready half-page for any and every title that wants one.

Example automated print page: house prices in Croydon

You might consider all this to be bragging, and in a sense it is: I’m incredibly proud of every single one of the team. But really, we are only pulling our weight. I look at local journalism in 2018 and see fantastically talented people working bloody hard to create brilliant content that lots of people read. It is they that set the bar so high.

As data journalists, we need to clear the same bar. We shouldn’t demand a different bar “because innovation”.

And we certainly shouldn’t talk down regional journalism, or any other journalism for that matter, to make others believe the bar is lower than it is. The bar in regional journalism remains high, and data journalism should aim to push it higher


The Real Schools Guide (2013-)

I have to include this: it was our first major project, and has been going strong ever since. At the time we first did it, we were still a tiny team hacking things together using freeware and Google sheets. It gave us, and everyone around us, a sense of what we might achieve. The idea behind the guide — which seems commonplace now — is that traditional league tables don’t give a comprehensive picture of how good a school is, from a parent’s point of view. And there is a tonne of other useful data available: on value-added scores, on truancy rates, on pupil-teacher ratios, and so on. The idea was to work with academics and teachers to put a weighting on these factors, and be very open about the methodology, in order to “score” every state-funded school in the country in different categories. Just as importantly, we wanted to create an convenient online resource where parents could browse all this data themselves. From the outset, the guide won huge praise, including from the government; it sparked imitations, and has influenced the way people think about rating schools with data. We now publish the guide twice a year — once for primaries and once for secondaries — and each one drives well over a million page views a year. The print supplements we create for daily titles have contributed to a significant sales uplift, too.

Sample page from the Real Schools Guide 2017

Unidentified bodies

It’s almost impossible to pick out specific news stories we’ve found and written, because there are so many highlights. (If you want to get a feel for the range and quality of stories we break, just browse my ‘weekly highlights of the data unit’ posts on Medium.) That said, I feel duty bound to mention one in particular: a simple data-scrape of cases on the Missing Persons website.

I think this is still a (joint) record for the most splashes we’ve got from a single dataset. It did phenomenal traffic online, too. More importantly, it taught us a valuable lesson which still shapes how we work. For this story, we didn’t just analyse the numbers, but presented full details of each case to local reporting teams. There were some really sad, really compelling individual cases. Obviously that was what made the piece. And that was the lesson: data journalism is as much about having skills to find stories as it is about an ability to do complex maths. In this case, as in so many others, the idea to do the scrape was a “data journalism” idea; the scraping was a “data journalism” technique. I’m sure some people would say “Ah, but this isn’t a story about numbers, so it’s not really data journalism.” In which case, fine, but I really don’t care. You worry about the definition; all I really care about is the quality of the content.

Investigation: Racism in the UK

I think it’s really important that as well as the day-to-day news, we also find plenty of time for in-depth investigations. Our racism project was a great example. It came about after a spike in hate crimes against Eastern Europeans following the Brexit referendum. That was a worrying and legitimate story. But it occurred to us that there was a strange contrast between the outcry over that (highly-visible) racism, and the lack of discussion of the deeper, structural racism that still exists — and makes a huge difference to people’s lives. We gathered cradle-to-grave statistics for different ethnicities in every part of the country, and a clear picture emerged almost everywhere: black and Asian pupils outperform white pupils academically, but in every other respect their outcomes are worse. That’s the very definition of institutional racism, only in this case the “institution” is the UK. In order to allow readers to explore the topic, we created a postcode-search gadget visualising the data, gathered case-study interviews (you should really read those the Manchester version here), and provided a video explainer.

Screenshot from our interactive

The project had real bite, with the head of the Equalities and Human Rights Commission describing our findings as “shocking” and a number of really worrying cases emerging as a result. There was nothing “new” about the data, insofar as it is all routinely published and freely available. But by bringing it together into a coherent (and troubling) narrative, and giving people the means to explore the issues themselves, data journalism can not just describe the world clearly, but hopefully play a role in bringing about positive change.

  • Like this? Sign up to our weekly email covering interesting stuff happening in local media in the UK — click here to get the email