How We Turned the WSJ/The College Rankings Into a Tool for Readers

How the Wall Street Journal is using Natural Language Generation to produce new content and engage its members

Published in

WSJ Digital Experience & Strategy

4 min readOct 11, 2019

The Wall Street Journal’s college rankings are used by incoming undergraduate students to decide where to apply and attend school. Last year we took a closer look at the project and asked ourselves these questions:

“What are students doing with this data?”
“What can we do better to help them take those actions?”
“And how can we leverage the evergreen nature of these rankings?”

Starting with data analytics, we found readers were engaging disproportionately with certain parts of the rankings package. Through interviews, we learned from our audience that there are two different times when college rankings are relevant to them: when they decide where to apply in the fall and when they decide where to attend the following Spring. In particular, once they were narrowing their focus down to their top few schools, they needed a tool to directly compare the schools where they got in.

Enter: our college comparison tool. Readers can put in the top two schools that accepted them, and we generate a story that directly compares those two schools.

But How?

It all comes down to using our information as structured data. Through Natural Language Generation and a tool called Automated Insights, we created a template to turn data into text with a series of if-then statements and synonyms.

Basically, we turned this:

Into this:

And then, into this:

Each of these stories, while unique, followed a similar format:

Begin with the overall ranking and associated costs.
Highlight which of our four primary categories (Student Outcomes, Academic Resources, Student Engagement or Environment) a particular school scored best at.
Call out the category the school scored worst in.
Finish with data from the survey about whether students on campus thought they’d attend again if given the chance.
Include synonyms throughout to ensure the generated sentences would read differently.

We also created a new process to ensure these articles would meet the standards of The Wall Street Journal. We asked ourselves:

How could we avoid false equivalences that would make two schools on opposite ends of the rankings sound too similar?
How could we replicate WSJ tone and style while maintaining an appropriate level of variability in the way the stories were worded?
How would we address to our readers that these stories weren’t strictly written by a human?

In a process that parallels the WSJ’s normal editing structure, we tackled false equivalencies using branches that had specific language for certain conditions. Then, editors reviewed a statistically significant sample of our generated articles. Finally, our standards department reviewed the project in its entirety, developing the human-machine dual byline that would clarify the nature of the project to readers:

The end result: stories for each school, making a long data table more readable and useful to people at just the right moment.

Keeping the Robot Reporter Around

Using this methodology we created 968 stories and published nearly 250,000 words on wsj.com — a volume that would’ve been untenable within the timeframe had the stories been compiled by humans alone.

Looking at the reach, engagement, and ability to drive habit we saw audience engagement that beat our expectations. We found that more than 60% of the visits were from new users. These visitors also over-indexed in WSJ’s internally-calculated engagement metrics and generated subscriptions that outpaced similar projects. Of those who interacted with the automated stories, the average tool user expanded nearly 5 college descriptions.

What we’re really excited about though is the creation of a new methodology that can be scaled and replicated across other data projects.

By taking a holistic approach to the project and joining the quantitative data with the feedback of the intended audience, we were able to create a piece of service journalism that helps our readers make decisions. Replicating this process of listening to the audience and creating new formats that make data and rankings more accessible, useful, and personalized is the primary goal of my role as Rankings Data Reporter. It’s one part of the broader ecosystem of innovation being fostered at The Journal right now, which we’ll be writing about here on Medium. Stay tuned!

Experience the comparison tool yourself here.

Kevin McAllister is Rankings Data Reporter at The Wall Street Journal. He can be reached at kevin.mcallister@wsj.com or on Twitter.

How We Turned the WSJ/The College Rankings Into a Tool for Readers

How the Wall Street Journal is using Natural Language Generation to produce new content and engage its members

But How?

Keeping the Robot Reporter Around

Written by Kevin McAllister