Machines Arguing with Humans and IBM’s Project Debater

Christopher Brennan
Deepnews.ai
Published in
5 min readMar 31, 2021
Project Debater and debate champion Harish Natarajan.

Editor’s note: This is one of the occasional posts we do speaking to someone with something to say on topics that we find interesting. If you are seeing this and are not yet signed up to Deepnews, sign up here to start receiving our blog posts every week, and a Digest of quality news on an important subject every Friday.

At Deepnews we love a good argument. Part of looking for quality writing and content is looking for quality arguments, implicit or explicit, made with solid supporting evidence.

Our model of course uses machine learning to search through thousands of sources to find the articles that we highlight, though others are looking to use the advances in technology to not find but *create* excellent debate.

The team at IBM is doing exactly that, and a couple weeks ago published a paper in Nature that presents the results of their Project Debater, which created a robot that was capable of debating an expert human in a formal debate setting. You may have gotten a glimpse of the work led by Noam Slonim back in 2019 when it debuted, though the recent paper explains its system architecture and shows that the machine’s statements were more highly rated than those written by non-expert humans (but less highly rated than expert debaters).

Graphic taken from the Nature article on Project Debater

It is all very impressive, but what exactly is Project Debater doing that is new? I spoke to Prof. Chris Reed, the director of the Centre for Argument Technology at University of Dundee, who wrote a commentary piece in Nature alongside the IBM paper and is an expert in the field of argument mining.

“I think that the thing that’s actually rather underplayed, even by them, is the feat of engineering o getting all the pieces to work together,” Reed said.

“Being able to go from the audio of listening to somebody give an opening speech. And then trawling through 400 million news articles to identify snippets of text that are appropriately relevant. And then editing and splicing effectively those bits of text together to render grammatically correct sentences. And then organizing those grammatically correct sentences into something that bears some kind of resemblance to a coherent narrative flow. That is astonishingly difficult.”

Prof. Chris Reed

Part of the core of this work is argument mining, where the machine finds relevant arguments and counter arguments from within its massive corpus of text. In a textbook an example argument to be found would be something like “Socrates is man. All men are mortal. Therefore Socrates is mortal.”

“Except that’s not what arguments look like. They look like: ‘Let’s go and get a beer now,’ ‘No, I’m a bit tired this evening.’ ‘Can we do it tomorrow?’ ‘Okay, let’s do it tomorrow because they’re going to have a special,’” Reed says.

So what does it mean that Project Debater can parse through text for arguments and then use them? What does being able to debate, a classic example of reasoning in action, show? On this blog we previously talked about reason when discussing the work of Dr. Henry Shevlin, who has worked on comparing machines to intelligences from animals to humans.

Reed says that Project Debater is still nowhere even close to human cognition. One example of this is evidenced in Project Debater’s “rebuttal” to the arguments put forward by its debate opponent. It takes the text spoken by the opponent, translates it into text, and then compares that text to arguments it has pulled from its corpus, the knowledge base it has been fed, and a specific database full of debate topics.

“It’s absolutely not creating a structure of the opponent’s argument, and then looking at that structure and reasoning about where the weak points are in it to formulate an argument. There is none of that,” Reed says.

In the same way that we need to avoid thinking about GPT-3 as intelligent, it is easier to understand what machines like Project Debater are doing when we focus on the work that they are doing, thinking back to the origin of the word “robot” as coming from words for labor. GPT-3 is able to generate passable text and Project Debater is able to accomplish the impressive task of debating by bringing together different systems. Both tasks involved language, which is how intelligence is often expressed and where AI has seen many recent advances.

By focusing on the broader task of parsing through language, Project Debater is doing something different from its IBM ancestor Deep Blue, which played Garry Kasparov in chess with the narrow goal of reaching checkmate. Slonim’s team points out that debate is harder to judge, with more subjectivity, and that pushing a machine to use more advanced forms of human language, even if just pulling from material it has been fed, pushes artificial intelligence outside of its “comfort zone.”

The question then is not about intelligence but about how machines working with more advanced human language can be useful, particularly as humans are confronted with the flood of text on the internet.

One of those ways that a focus on argument may be useful in that we as humans can start understanding what is inside of text, rather than just the number of clicks it gets, much faster and on a much larger scale. That’s something that Deepnews is working on in terms of our quality metric, and our technology shows the benefit of letting a machine help categorize content.

Being able to engage with individual arguments within an article at scale would be another step forward. Reed’s team a couple years ago teamed up with the BBC to look at the challenge of fake news and use argument mining to help aid human students’ reasoning. If applied on a broad scale, this can help increase the quality of debate.

“What we’re starting to think about is ways in which AI systems are able to understand the arguments of humans and contribute to the debates as a whole. Teams can be expanded to have some human members, and some AI members, and the team as a whole can then produce better quality decisions,” Reed said.

--

--