An ethical checklist for robot journalism

Updated March 2016

News organizations are experimenting increasingly with robot journalism. They’re using computer programs to transform data into news stories, and news stories into multimedia presentations.

As these experiments go forward, we need to think about the ethics of robot journalism. When editors consider using automated newswriting, what issues of accuracy, quality and transparency arise?

Most uses of robot journalism have been for fairly formulaic situations: company earnings reports, stock market summaries, earthquake alerts and youth sports stories. The Associated Press transmits automatically written company earnings reports generated by Automated Insights.

The Russian search engine Yandex is using automation to produce weather and traffic reports. Le Monde worked with Syllabs to provide thousands of instant, localized result stories during the 2015 French elections.

But inevitably, news companies will be testing automatic newswriting on more challenging subjects. Here’s my checklist of what editors should ask:

How accurate is the underlying data? Does the data consist of publicly announced numbers from companies, a stock exchange or government? If so, it’s probably safe for automatic crunching — with regular checks to make sure the data is being properly transmitted. However, not all data comes from such authoritative sources. If scores are being sent in by dads from their kids’ soccer games, how will you assure the data is reliable? Your readers will hold your organization responsible for the data, whatever its source. Its accuracy affects the credibility of your whole organization.

Do you have the rights to the data? Does the data provider have the legal right to send it to you? Do you have the further right to process and publish it, and if so on what platforms?

Is the subject matter appropriate for automation? It’s usually an easy decision to automate completely factual information. But factual information can be molded by the provider to its liking. Imagine if political campaigns began to offer data feeds of candidate speeches — location of speech, size of crowd, main points, key quote, etc. Even if a news company’s algorithm added background information on the candidate, poll numbers, etc., would we feel comfortable basing a news story on what the campaign considered the most significant things he said? How would a story like this be different from a press release?

How does your automation organize the data? In a stock market story, what stocks and indexes will you lead with? Will you compare the latest numbers to the start of the year or to five years ago? Substantial advance thinking needs to go into what data the algorithm will highlight. For some stories, you may need to switch off the automation entirely — say, for the financial results of a company in turmoil where full context is necessary from the very start. You also need staff standing by to take stories off autopilot in the event of a sudden development — for instance, in sports, if a player is seriously injured or fights break out during a game.

Will you disclose what you’re doing? Some organizations believe they must tell readers that a story was automatically produced. The AP does that, and provides a link that describes how the process works. Some other companies have experimented with software designed to encourage children’s sports by highlighting the young players’ successes and playing down mistakes. Would you intentionally distort your coverage like that, and if so would you disclose it?

Does the style of the automated reports match your style? Spellings, general writing style and capitalization should match the rest of your content. Readers will be suspicious of content that doesn’t feel like the rest of your journalism. Have someone not involved in the automation project read the automatically written stories for style and flow.

Can you defend how the story was “written”? If people question the facts in a story or how they were juxtaposed, can you give an explanation (or get a quick answer from your data and automation providers)? “The computer did it” isn’t much of an explanation. Are your automatic writing processes so well documented that even as your staff turns over, you will be able to thoroughly explain how every story came to be?

Who’s watching the machine? Errors with underlying data or automation software can quickly metastasize, potentially creating thousands of erroneous stories. Test the automated product thoroughly before anything is published. Even when publication begins, have a human editor check every story before it goes out. Once the product proves itself, stories can go out automatically with spot-checking by human editors.

What about long-term maintenance? Even error-free automatic processes can’t simply be set up and left to run on their own forever. Background material needed by the algorithm (team names, company headquarters locations) may change. A data source may suddenly become less reliable. The either-or choices that worked last year in the algorithm last year may no longer be appropriate for a new situation. Responsible news organizations will have people constantly maintaining their data and reviewing the choices that algorithms make.

Are you considering automation that creates multimedia presentations? Some automated systems create video or photo displays to accompany text stories. If so, can you assure that the system is accessing only imagery that you have a legal right to use? How will you make sure it doesn’t grab imagery that’s satirical, hateful or not in line with your standards of taste? Can you be certain it’s selecting photos and video of the event itself, as opposed to simulations or hoax versions?

Are you using software that reduces long articles to bullet points? Test it extensively to make sure it’s truly finding the important points. And find out if the software you’re considering requires the original article to be in a certain format — say, the typical inverted pyramid style used in news stories. Text written in other ways may yield poor results. (At AP, we tried dropping the Book of Genesis into an automatic summary program we were testing; the bullet points created by the program left out the Garden of Eden.)

Are you ready for the next frontier? As automated journalism pushes its way into more sophisticated stories, it will become ever more controversial. A politician may demand to know why her name doesn’t appear more often; it’s not hard to imagine political activists — or parties to a legal case — demanding the source code behind automated coverage. Will you be willing to reveal exactly how your software works? Or do you consider it proprietary information?

The best protection as you move into robot newswriting is a constant focus on planning, testing and your own editorial standards.

Plus, it’s good to recognize that some things are still best done by humans.

___

Parts of this checklist are based on presentations to the Global Editors Network in Barcelona, the Collège des Bernardins in Paris and the Tow Center for Digital Journalism at Columbia University.