What do journalists do with text?

Published in

Text Data Stories

3 min readJan 29, 2018

(Asking for your help to find out)

Over the last few months, I’ve been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:

how and in what format they obtain the text,
how they find newsworthy information in the documents,
using what tools,
for what kinds of stories,

…among other details.

This inquiry is part of my John S. Knight Journalism Fellowship project at Stanford University, where I’m working on designing text processing solutions for journalists.

What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.

What’s your experience?

If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.

Here’s a few reasons and incentives to contribute:

1. Help create a public database of text-data-driven stories

Pieces based on the analysis of text collections are not as common as their structured-data-driven counterparts, and are harder to find all in one place. One of the goals of this survey is to create a database of examples. As the rest of my work in Stanford University, the data will be publicly available for anyone to use.

Concretely, it will include information about the story:

Title
Media outlet
Date of publication
Author(s)
Link

And details about the production and sources:

Type of document(s) used
Source(s)
Number of sources
Size of the text collection
Type of elements considered
Type of analysis
Tools/methods used
Time needed for production

2. Make sure your work gets included

I regularly check news websites looking for examples to bookmark, but it’s not always obvious whether a story involved text analysis or not, and many times they don’t come with a “how we did it” blog post associated. On other occasions, it’s hard to find works from many years ago, or stories that are no longer available online. Please, help me find yours.

Check out my Pinterest account to see my (visual) collection of text-data-related products

3. Share your process (and learn from other people’s)

The details about the production of these stories, especially the software and approaches used, could be a valuable reference for beginners as to what skills and tools are more relevant, or for more experienced journalists to compare notes with fellow text-data enthusiasts.

4. Help lower the bar of complicated text processing

One of the goals of my project is to make solutions accessible to reporters who don’t have the time (or desire) to become “text-miner-journalists.” As I wrote in a previous post, the list of skills for this area of expertise is long, and the training time-consuming. Ultimately, my interest is to find ways to bring the benefits of these techniques to more reporters.

5. No text-driven stories? No problem

Finally, although the questionnaire is only useful if you have an example to share, I’m also interested in hearing about less successful cases. Did you have to deal with a group of documents that was too complicated to process? Was there a text or file format that became your nightmare? I want to hear all about it. Email me, and maybe we can put together a list of interesting challenges.