Building a Robot Journalist 🤖

Bakken & Bæck teamed up with news agency NTB to create Norway’s first robot journalist. The result was a digital football reporter that writes articles like a human.

Automation and algorithms already play a big role in the media industry. Programmatic advertising, algorithm-operated front pages and machine translation are some of the ways automation has infiltrated the news desk.

So-called “robot journalists” are another blooming example of automation in the newsroom. The name is a bit disingenuous, since it’s not really a robot, nor really a journalist, but an algorithm. By the use of Natural Language Generation (NLG), the algorithm generates text from a pool of data, and writes the text based on a set of predefined rules and templates. Done right, it is a great tool for journalists and editors, and it can free up time for more valuable work.

Even though the field of Natural Language Generation has existed for over 40 years, the commercial application of this technology has only been mainstream for 6 or 7 years. As the connected world produces exponentially more and more data, the potential of NLG tools is growing proportionally. Companies like Automated Insights, Arria and Narrative Science have taken the lead in bringing NLG technology to market.

Within the field of journalism, NLG technology didn’t really have a breakthrough until 2014. That year the LA Times launched their QuakeBot, which extracted data from larger earthquakes and put them into pre-written templates. The main focus of the QuakeBot was speed, and its main objective was to get the report to the public as fast as possible.

This was the backdrop when we teamed up with NTB to create our own robot. The ambitious goal was to have the robot create summaries that would be free of blatant errors, and that didn’t have to go through an editor, but could be distributed straight to NTB’s customers. By the end of the project we had reached this milestone.

How it’s done

A good basis of data
In this specific project we wanted to focus on creating summaries from football matches in the Norwegian top division. NTB have reporters on all matches providing data through their live service. In addition, they have, a database with various football-related statistics. By combining these sources, we could feed the robot with enough good data for generating high-level articles.

By combining live data with historic data we had a good foundation for creating interesting articles, since the robot doesn’t know “quality” (was it a nice goal, a fair red card, etc.). To find interesting content we then enrich the graphs with more links: previous goal by this player, this team, on this stadium, and also player/teams first/last/3rd goal this match/seasons etc.

Building structure and coaching the robot

The overall structure of a match summary is quite standardized. You need a great title, a lead paragraph with the most important points, a body of text describing the events as they unfolded, a post-script with some additional interesting information from the match and finally a conclusion. This way we could make a basic skeleton that all articles are built upon.

The overall structure of an article.

To be able to generate natural sentences the robot needs to be provided with a set of words and expressions that can be weighted and selected based on a broad range of criteria. For instance we may want to vary what we call the home team by using their local nickname, or we need to provide it with specific terms like “hattrick” or “matchwinner”. The sports journalists at NTB created a vast set templates for the robot to use to make the language as natural as possible and provide it with the proper vocabulary.

The example shows how the mentioning of a player varies based on context. First the full name, then only the last name, and the third time some kind of alias referring to his age, position or nationality is used.
Example of a rule: If a team has been trailing behind, but then equalizes and scores the deciding goal after 85 minutes of the game has passed, the headline will tell the story about how the winning team turned the game around and won.
Example of a rule: “Nordkvelle became the matchwinner when Odd beat Stabæk 3–1”. The rule specifies that if a team wins by one goal, the player scoring the deciding goal will be mentioned in the headline.

Through rigorous testing and iterations the engine was trained to avoid weird mistakes and improve on the linguistic variation. One such spectacular mistake was celebrating the poor guy scoring a late own goal as the hero of the game.

The possibilities

In this case, the football robot can be further developed to provide better and more varied articles. For instance, the local paper of a sports team would present a different story than the local paper of the visiting team, and different competing media outlets would focus on different things in their stories.

The stories could also be broken down to the level of the individual user, building personalized articles based on location or interests. Moreover, the breadth of the coverage can be scaled almost infinite with expanding to more divisions, different age segments or other types of sports.

Robot journalists are not going to be replacing real journalists any time soon, but they have the potential to free up vast amounts of resources if used correctly. The world needs more quality journalism, and the robots can be a great tool for assisting journalists in data collection and structuring. Done right, a robot journalist can create an infinite number of varied articles in seconds, and free up big chunks of time for more valuable work.