The Journalist-Engineer

Lately, some of the best articles in the NY Times and Bloomberg are 99% code. The end-product is predominantly software, not prose.

Here’s an example: the NY Times’ mapping of migration in the US.

Mapping migration in the US. A elegant depiction of a complex topic.

Several years ago, this article might have been a few thousand words. There’d be tables and charts. They’d reference academic studies and correlate the data with something like unemployment.

This example is different. It’s a well-designed data dump. It’s raw numbers without any abstractions. There’s no attachment to the news cycle. There’s no traditional thesis. It cannot be made in Photoshop or Illustrator. You must write software.

It represents the present-day revolution within news organizations. Some call it data journalism. Or explorable explanations. Or interactive storytelling. Whatever the label, it’s a huge shift from ledes and infographics.

Here’s another example: a graphic from the NY Times on yield curve data:

Via the NY Times’ A 3-D View of a Chart That Predicts
The Economic Future: The Yield Curve

The story is the code. It depicts the yield curve, an incredibly complex system, in all of its glory. It’s an amazing piece of software (I bet financial companies would even buy it).

Here’s another example: The Parable of the Polygons, an explanation of an academic paper about segregation.

Parable of the Polygons, by Nicky Case.

It’s is a very elegant presentation of a system using code. For reference, here’s how the original author conveyed the idea, back in 1991.

The author’s original depiction of the system of segregation.

No need to hate on the design — it worked just fine, pre-Internet. But today, code makes the possibilities so much richer.

Here’s two more excellent depictions of complex ideas using code, one on OPEC prices and another on machine learning:

The NY Times’ visual of OPEC prices.
An Introduction to Machine Learning, done expertly in D3.

Note that all of these examples brilliantly include some prose. There’s an expert presenting her beliefs about the data, which acts as a guide to the data and launches the reader into their own discovery process.

What Happened?

Creative coders turned their sights from media art to journalism. They’re writing software about ideas that have eluded traditional news organizations, either because they were too complex to explain in prose or they were trapped in a spreadsheet/academic paper.

And that’s what I’m doing with Polygraph…liberating those ideas.


A couple months ago, I published an article comparing historic and present-day popularity of older music. I used two huge datasets: 50,000 Billboard songs and 1,4M tracks on Spotify.

If I were writing an academic paper, I’d do a ton of analysis, regression, and modeling to figure out why certain songs have become more popular over time.

Or I could just make some sick visualizations…

My article on music timelessness, employing a ton of code to visualize the nuances in data for 50K songs.

Instead of reporting on my “theory”, I wagered that readers would get more out of an elegant presentation of the data, not an analysis of it. It’s a completely different approach to storytelling.

Here’s that same approach on another project: rappers and the size of their vocabulary. The process: depict the system (the vocabulary among rappers, Shakespeare, and Melville) rather than a thesis/point.

My article on rapper’s vocabularies, which used an unconventional chart design to elicit the data more quickly.

Instead of proving that one rapper was better than another, readers are really good at absorbing the data, and they’d much rather form their own judgements.

TL;DR

A few years ago, Bret Victor wrote about the notion of passive and active readers:

“An active reader asks questions, considers alternatives, questions assumptions, and even questions the trustworthiness of the author…An active reader doesn’t passively sponge up information, but uses the author’s argument as a springboard for critical thought and deep understanding.”

In theory, this sounds great…but kinda crazy. Imagine it: the Internet engaging in intense discourse over data.

It would represent a big shift in journalistic voice and place an enormous burden on the reader: “you find the story. You’re the data analyst.” It’s the opposite role of traditional media, which assumes the role of informer: “we have the knowledge; you don’t. We’re an authoritative source. Read, listen, watch this thing we researched.”

But it’s happening — there are active readers. I’ve been shocked at readers’ response for the handful of projects that I’ve worked on. Readers feel powerful. They don’t know what to call it — it feels foreign.

I believe it’s a response to “too long, didn’t read.” I open a 10,000 word article, and I anxiously wonder whether the time investment will pay-off. Maybe the author’s point will suck.

An experience for active readers doesn’t create anxiety. They don’t feel the burden of time — it’s at their own pace. Give readers the right depiction, avoid abstractions, add a narrative to guide them through the experience, and they’ll data science the shit out of a story.

Why Now?

There’s a few reasons why things are so different in 2015.

News organizations had to accept that code-driven content wouldn’t have a viable print-version. The NY Times launched a data-led blog, The Upshot, to address this tension. Michael Bloomberg is subsidizing all of Bloomberg Business’s engineers in the Editorial Department. No one else is even close to making a head-count commitment.

We needed a pretty unique skill-set: people who could design, write, and code. The talent pool has arrived: all of the coders who were creating apps, dashboards, and analytics tools could shift their design sense from users to readers. Like traditional journalists, engineers now had plenty of empathy for how people consume information.

D3 came along. Visualizing millions of data points on the Internet used to be impossible. And browsers are now robust enough to render our creations.

D3 also made it easier to be creative with design. Pre-D3, we were taking screenshots of charts from Excel. Now, you can create something that best expresses the data, instead of limiting yourself to traditional, pre-Internet design patterns (e.g., bar chart, scatter charts, pie charts, etc.).

Here’s an example of that process: an evolution of how Mike Bostock explored various bespoke designs for a visualization of corporate tax rates.

The evolution of Mike Bostock’s Corporate Taxes chart for the NY Times

Note that very few of these designs would be possible in statistics packages or design programs. Long live D3.

The Future

There’s 4.3 million people subscribed to Reddit’s /r/dataisbeautiful sub. It’s a top 50 subreddit. The Internet has grossly undervalued our intrinsic interest in visualization. I expect the market for this sort of content to explode.

I’m psyched for the next wave of software that makes journalism easier to code. Someone will write a framework for Oculus Rift. Someone will figure out D3 for mobile. Even on desktop, scroll-based events are still in their infancy.

In the mean-time, I’ll be busy coding.

Matt Daniels is founder of Polygraph, a publication that explores popular culture with visual storytelling. Here’s my back-log of projects. Help me create them!.