Data is (or are, if you insist) the darling of the news business these days. Visualizations of data — “data viz” in the parlance — are popping up on news sites everywhere. It’s so widespread that a presenter at an MIT Media Lab event in 2012 showed a slide filled with data visualizations that looked like supernovas, carnival spin art, or ripe dandelions and added the plea: “Make it stop!” Information that could be expressed in a paragraph is being stretched to fill so-called infographics that take up yards of screen space, leading to a new cottage industry in tools to create them. (I fear this will do to the science of information what PowerPoint has done to the arts of narrative and discourse.) News organizations, journalism schools, and governments are busy holding hackathons to force hacks (journalists) and hackers (programmers) into arranged marriages so they can take a data set and give birth to a tool around it (everything from bus schedules for our phone to a map app built on a census of the trees that grow in Brooklyn). It’s hard to have a conversation about the future of news these days without someone (often me) citing The Texas Tribune’s success with data, which generates two-thirds of its traffic from users searching numbers about government salaries, prisoners, school comparisons, and much more.
What we’re seeing is the healthy first phase of fascination and infatuation with a shiny new tool. It’s a good thing. Once journalists and users overdose on cool vizes and huge infographics and highly specialized apps, they’ll be left with a new appreciation of data as a source and form of news and, I’m sure, a continued eagerness to explore new opportunities there. Data is an attitude. It is one tool that can help realize the larger ideal of openness in government, business, journalism, and society. Acquiring data and making it available to the public so anyone can investigate its meaning is an act on behalf of transparency. Before getting one’s hands dirty with tables, charts, and code, journalists need to lobby on behalf of open information. News organizations should be demanding that government at every level open up and release data in standard and manipulable digital form — not paper, not PDFs — to allow anyone to share and analyze it. If they don’t, we must open up government by force. I’d like to see every news organization, large and small, newspaper and blog, sponsor FOIA clubs in their communities to get scores, hundreds, thousands of citizens helping to open up data. I’d also like to see us train government in the value of sharing information so that transparency is used not just as a means to get the bastards but also as a way to work together. Done right, opening data is an act of collaboration, of building platforms for shared information.
The first step in working with data, obviously, is gathering it. This means not just getting government data or other data sets that already exist. It means helping to create new data. See the section above about networks of people and sensors producing endless bits of information. See the growing if cultish quantified-self movement of people trying to measure and record everything about their lives to learn lessons from it. Tools like Ushahidi and SeeClickFix — I wish I had more examples — ask the public to gather and share their own data. A news organization collaborating with the public could pool data on local infrastructure (the proverbial pothole report) or how well-equipped schools are or how many people use parks or — here’s the perennial favorite — how much gasoline costs near you. Companies alaso have no end of data that would be useful if made public: Mobile phone providers know how fast phones are traveling from cell to cell along highways; they could deliver traffic reports more accurate than anything on radio. But then Waze beat them to it. Hospitals and insurance companies know about clusters of illnesses; properly anonymized, this data could save lives. Supermarkets could compare how healthy one neighborhood’s diets are versus another’s. Google knows what topics interest us (and lets us compare them). Where’s the news in all that data? Who knows unless we can get to it.
The next step is analyzing the data. Here expertise is needed, in technology and in statistics. Some argue that every journalist should become a data jockey (and a programmer, too). I don’t agree. Journalists need to collaborate with experts, knowing what’s possible and being able to express their goals. They need to be technology-literate and data-eloquent. They also need competence in numeracy (a skill too many journalists pridefully lack). A bunch of numbers in a grid is pretty much useless until a viewer can dig in to identify trends, patterns, correlations, and anomalies. Sometimes, data sets will yield their secrets when joined with others — when, say, the incidence of breast cancer in an area is put atop data on pollutants and a correlation emerges. Of course, one must be cautious in reading too much into that correlation; it is not proof of cause. One must also be cautious of fetishizing data and thinking that any observation with a number at its core is worthwhile (I am reminded of Vox’s observation that the Netherlands lost more citizens proportionally in the shooting down of Malaysia Airlines Flight 17 than the United States did in 9/11 — what does that signify?)
The next task in data journalism is presentation. This, the front-end, is the fun and flashy part. Journalists have a sense for presentation, so it’s no surprise that they’ve taken to visualization, creating big and splashy charts, timelines, and interactive or animated graphics that try to tell a story (without necessarily writing a story). But sometimes the best presentation of data is still in text. Narrative Science uses algorithms, rather than editors, to turn structured data, such as financial reports or sports scores, into readable text articles using sets of rules (if one team scores N points over the other, then use the verb “trounce”). In a sense, text is just another form of data visualization, for we readers have been trained to garner some kinds of information more readily from a narrative than from a statistical table. A more ambitious presentation of data is an application that allows a user to explore and query the information herself, asking for facts about an address or a date or a name. That allows members of the public to ask their own questions, find their own uses and stories, and reach their own conclusions.
News organizations themselves need to be good data citizens, opening up the information they create in forms that can be shared and analyzed by others. That means giving access to archives, because once the flow of news passes into the past, it becomes data. It means adding metadata to our information — for local news organizations, there’s no excuse not to have every location in a story geocoded. It means tagging stories with topics, making it possible for readers to subscribe to updates on those topics. I’d like to subscribe to notifications of corrections on stories I’ve read or linked to. We should also open up information about usage of our content: most popular and most emailed (which are common these days); most recommended and most commented on; and perhaps most impactful.
Data is a critical new opportunity for news organizations. What journalists have to ask — as with the flow of news — is how they add value to data by helping to gather it (with effort, clout, tools, and the ability to convene a community), analyze it (by calling upon or hiring experts who bring context and questions or by writing algorithms), and present it (contributing, most importantly, context and explanation). Witness how WikiLeaks discovered it needed news organizations — the Guardian, Der Spiegel, The New York Times — to add value to its data (which heretofore hadn’t made enough of a splash) with editing and redaction, explanation, additional reporting, and — most important to WikiLeaks founder Julian Assange — distribution and publicity.
Data needs to become a mindset and a skill set in news organizations. Journalists should receive training to become literate in the opportunities and requirements of using data. Journalists also have to work with specialists who can analyze, interpret, and present data, and who can create tools allowing both reporters and the public to work with it. From a business perspective, data should be seen as an asset worth investing in, one that can yield news and new engagement often at a low cost. Data is/are a step past the article.