Journalism in the Age of AI

How technology is upending how we produce and consume the news

Ashok Giri

Published in

PageMajik

6 min readMay 21, 2018

An Olympic Achievement

The Washington Post’s coverage of the 2018 Winter Games in PyeongChang was somewhat unusual. Glancing through their social media page, the articles and updates might not have looked particularly different. But that was precisely what was unusual. For, you see, it was not composed by a human reporter.

The Washington Post Olympics Bot (@WPOlyBot) generated constant updates on Twitter during the Games, letting viewers stay on top of the latest developments. The updates included announcements about events that were beginning soon:

A line about the winners of events and any specific achievements:

And even a periodic calculation of the cumulative medals won by various countries:

These updates are based off data from sports data companies, and ensure total coverage while not burdening real journalists or relying on human speed and reporting accuracy. While this certainly eased the burden on the human journalists, this was not meant to be a totalreplacement for them. Rather, it was intended to “free up Post reporters and editors to add analysis, color from the scene and real insight to stories in ways only they can.”

While the benefits of the Olympics Bot are very real, there are also easily spotted limitations. Its data was taken from other sites, which meant it was still dependent on human activity at some point in the chain. Moreover, as you scroll through the twitter feed, you notice that the tweets are themselves somewhat plain, having substituted clarity for style. Human journalists then are still quite essential to journalism.

A Dowsing Rod for Information

Although it has to be conceded that AI cannot simply replace human journalism, it can still be asked whether they can help approach the avalanche of online content produced everyday. One interesting proposal that has been made recently is from a recent paper by Google’s Yinfei Yang and UPenn’s Ani Nenkova where they propose testing for “content density”.

According to the authors, “content density” is a measure of how much information there actually is in a certain piece of writing. It is a way of separating serious information articles from mere fluff, and in this way ensure that readers can focus their finite time and energy as effectively as possible on actual content.

To get a sense of what the difference between informative and non-informative content is, consider an example they provide to illustrate this distinction:

Informative:
The European Union’s chief trade negotiator, Peter Mandelson, urged the United States on Monday to reduce subsidies to its farmers and to address unsolved issues on the trade in services to avert a breakdown in global trade talks.
Ahead of a meeting with President Bush on Tuesday, Mr. Mandelson said the latest round of trade talks, begun in Doha, Qatar, in 2001, are at a crucial stage. He warned of a ”serious potential breakdown” if rapid progress is not made in the coming months.
Non-informative:
“ART consists of limitation,” G. K. Chesterton said. ”The most beautiful part of every picture is the frame.” Well put, although the buyer of the latest multimillion-dollar Picasso may not agree.
But there are pictures — whether sketches on paper or oils on canvas — that may look like nothing but scratch marks or listless piles of paint when you bring them home from the auction house or dealer. But with the addition of the perfect frame, these works of art may glow or gleam or rustle or whatever their makers intended them to do.

Assuming that journalistic conventions will more or less remain the same, the authors designed a classifier that utilizes a machine learning approach to differentiate between informative and non-informative text using lexical features (eg: words and their associated average age of acquisition, imagery, and concreteness) and syntactic features (eg: the flow between sentences in terms of discourse relations and entity mentions).

The classifier then categorized with a 67–75% accuracy a test set of articles from different domains. Admittedly this is not quite 100%, and the assumptions made about steady journalistic conventions mean this cannot just be applied broadly just yet. Still, by showing that a successful model that can select for content density better than chance is possible, Yinfei Yang and Ani Nenkova open the possibility of cutting down on time lost on wading through the ubiquitous fluff we seem to be awash with.

As impressive as content density is as a measure of news-worthiness, a problem it cannot address is political bias. After all, there is no dearth of sites which produce article after article stuffed to the brim with deeply partisan content, so just being able to detect content density is not going to be enough.

Knowhere Else to Go

A startup that tries to deal with precisely this is Knowhere News, which boasts of offering “the world’s most unbiased news”.

The way it works is quite straight forward — the site’s AI engine looks for whatever topic is popular at a given time, scours multiple articles on that topic, and then generates an unbiased version of the news. Since there is no journalism required, the actual writing can take as little as 60 seconds!

To work around the fact that not all news sources are equally reliable, human input is required to pre-set points for trustworthiness to value reliable sources over fringe views.

For political stories, Knowhere News even produces two additional articles for the left and the right in addition to the impartial version. For example, the headlines of a recent topic were:

Impartial: Whistleblower on Trump lawyer finances says records are missing
Left: Whistleblower on Trump lawyer finances fears cover-up
Right: Person who leaked Cohen’s financial information questioned

This example really emphasizes how important such a tool can be in our era of hyper-partisan politics. But its limitations are also clear — for one, this too will depend on the work of human journalists to create a mass of articles it can work on.

More importantly, an assumption animating this project is that the “impartial” or view from the center is the most appropriate one to take. While this might very well be true in some cases, there is a risk of legitimating extremist views if we are always willing to meet in the middle. For example, if one political party starting moving towards fascism while the other remained moderate, the impartial view generated by Knowhere News would be a moderation of fascist claims instead of its repudiation. It is important to recognize that while moderation and conciliation are valuable ideals, they can be taken too far.

A Future of Robot Journalism?

Admittedly, the examples examined here don’t exactly mean pink slips for journalists just yet. AI still employs machine learning algorithms that rapidly sift through already existing data to provide updates and create bias-free versions. But these algorithms still need human hands to create the material they can draw on.

But let’s not get complacent about AI just yet — there are still many paths through which AI can make inroads to original journalism. As more academics and culture influencers become active on social media, it is conceivable that an AI system might direct engage them through journalistic activity. Interviews conducted over email don’t necessarily need a human asking questions, putting a new spin on the Turing test. And with the network of cameras and audio devices like phones present in every locale in every community, there might even be enough raw data for field reporting by AI some day.

Granted, the technology that will be required for these advances don’t even look close to materializing just yet. But given the speed at which technology has been overturning entrenched assumptions, it might be hubris to be too cocky about its limitations.