Picture from IBM press release published October 2020.

Watson NLP Key Point Analysis May Exacerbate Social Inequality If Used Wrongly for Social Analysis

A quick look at the split of IBM into two separate public companies and the recent press release for Watson NLP Key Point Analysis

Alex Moltzau
Digital Diplomacy
Published in
9 min readOct 12, 2020

--

A lot seems to be happening with IBM lately. The announcement to split into two separate public companies is one interesting move. This article is a short look at IBM splitting into two public companies and further a longer critical look at the commercialisation of Watson NLP.

Just to be very clear:

In this article I do not state that Watson NLP Key Point Analysis cannot be used for social analysis. I am simply looking at their recent press release with a very critical eye as a social scientist. I think programming can be very useful when used critically to discuss social data.

Splitting IBM & NewCo

Splitting digital infrastructure operations into NewCo (temporary name) and then these other operations (software?) into IBM may put extra pressure on the software division to deliver results.

Earlier this year in April it was written in Business Insider that Amazon’s cloud generated over $10 billion in net quarterly sales for the first time ever — up 33% from a year ago.

Wedbush Security analyst Moshe Katri commented:

“IBM is essentially getting rid of a shrinking, low-margin operation given the cannibalizing impact of automation and cloud, masking stronger growth for the rest of the operation.”

In Forbes it was said by Peter Bendor-Samuel:

“The open source AI operating model and the organization it takes to drive and succeed in that business is a different kind of organization than is necessary for managing and modernizing legacy environments.”

What will this entail? Does it put more pressure on marketing software?

One way to look at it is the recent commercialisation of software products — for example with IBM Watson.

I am interested this time around in a recent press release that IBM plans to commercialise Key Point Analysis inside Watson NLP products including Watson Discovery.

The IBM Press Release is Highly Debatable

The press release was titled:

“IBM Watson Demonstrates New Natural Language Processing Advancement in Premiere of “That’s Debatable”

That’s Debatable” is a new, limited series presented by Bloomberg Media and Intelligence Squared U.S. sponsored exclusively by IBM.

Screenshot from Research.IBM.com

It was supposed to provide insight into the global public opinion on the motion:

“It’s time to redistribute the wealth.”

It features industry leaders, economists, policy makers and public intellectuals debating some of today’s most pressing issues.

What the people think or what the science says?

It is debatable can be said about a lot of topics.

Why now?

Is IBM attempting to sell this technology to the U.S. market engaged in campaigning?

What do they want to gain through this promotional campaign?

One of the largest companies in the world sponsoring a debate about redistribution of wealth… Well, I am not sure!

Is aggregating opinion data the best way to decide if redistribution of wealth is great or not?

As a side note Salesforce wanted to use AI to create the best tax policy:

IBM is not the first attempting social commentary through aggregating data on either opinion or artificial decision-making environments.

There is a quote by historian Jill Lepore in Nature that I find fitting:

“Ignorance of history is a badge of honour in Silicon Valley.”

She looks back at propaganda, and how the field with similar processes changed name to mass-communications in the U.S. Alongside the people working for Kennedy in his campaign.

Another fitting sentence is from a title by Pratyusha Kalluri:

“Don’t ask if artificial intelligence is good or fair, ask how it shifts power.”

An article worth reading:

On that note, enter “That’s Debatable.”

One of the tag-lines from IBM press release was:

“Bringing More Global Voices To The Debate.”

Does it?

Where are these voices from?

There are no numbers clearly presented to show such an ambition.

The show used Key Point Analysis, NLP from IBM Research, to determine the main points from text submitted by 3,500 submissions people prior to the debate.

“Of the 3,500 submissions, there were 1,600 usable arguments and 20 key points identified.”

This prompted an exchange by debaters.

On another side note, it is possible to contribute to the next debate on this link:

This link may of course disappear once the project is over.

In the first debate mention above (on wealth redistribution) the technology identified 56% were for redistributing wealth and 44% were against redistributing wealth.

Here is an excerpt for information:

  • “56 percent of arguments analyzed were for redistributing wealth, with approximately 20 percent of analyzed submissions arguing that there is currently too much wealth inequality in the world. One argument was that income inequality has increased dramatically over the past few decades, causing excessive suffering to large populations, and that if wealth is not redistributed, far greater will suffer.
  • The remaining 44 percent of analyzed arguments were against the motion, with 15 percent of those arguing that redistributing wealth would discourage some people from working hard. One example argument in support of this is that redistributing the wealth discourages individual initiative, entrepreneurship, and accountability for choices.”

I find it somewhat troubling that when large generated algorithms with some of the most acknowledge language models in the world cannot differentiate racism very well, then AI based on text analysis is supposed to be helpful in arguing about wealth redistribution.

Before looking at the pipeline by IBM consider this for GPT-3.

Now, this is the pipeline presented by IBM.

In terms of the virtual audience (beside the model) the percentage against wealth redistribution changed after the debate.

The audience voted more agains wealth redistribution after the debate.

That is, on the show there was a live debate based on the key points.

They clearly managed to get big names into this debate.

Yanis Varoufakis, Former Finance Minister, Greece debating against Allison Schrager, Senior Fellow, Manhattan Institute.

Yanis for — Allison against.

Robert Reich, Former US Secretary of Labor up against Lawrence Summers, the Former Secretary of the Treasury.

Robert for – Lawrence against.

Schrager and Summers won (according to polls) with an increase of 17 percentage points. According to IBM:

“To determine the winner of the debate, the virtual debate audience was polled on the motion prior to the start — 57 percent of the virtual audience was for, 20 percent against and 23 percent undecided. Following the debate, the audience voted again with 59 percent for and 37 percent against, declaring Schrager and Summers the winners with an increase of 17 percentage points.

We live in a time of stark economic inequality that has been rising.

Can we simply aggregate opinion data to use as arguments against?

Then again, what opinions are taken into consideration?

What data goes into the analysis?

The result means more people were against the motion after the debate:

It’s time to redistribute the wealth.

What can we say about the process of analysis?

Unequal distribution of data on redistribution of wealth

IBM attempts to summarise and represent opinions.

To generate keypoints it does the following with my comments ( →):

  1. Classify Arguments Every submission is analyzed using a deep neural network to determine if the content is for or against the position statement, and submissions deemed irrelevant or neutral are removed.
    → How do we know what measures a neutral or irrelevant argument?
    → Where is this presented in an explainable format?
  2. Identify Key Points: From 3,508 arguments submitted on the first motion, 1,600 were deemed usable. The technology evaluates the quality of each argument and identifies potential key points by grading and filtering high-quality arguments. It disregards potential key points that are too long, too emotional in tone, are incoherent or include redundancies. From 1,600 usable arguments, 20 key points were identified.
    → It disregards potential key points that are too long, too emotional in tone, are incoherent or include redundancies.
    → From 1,600 usable arguments, 20 key points were identified. It is not inconceivable as such that the algorithm completely disregarded people that had less education and possibly worse grammar. Although grammar an education may not be related.
    → In attempting to analyse redistribution the analysis absolutely messes up social distribution within the very data foundation it attempts to analyse.
    → Come on IBM! You can do better. Is this even IBM?
    Then again, these tools used by presenters is simply a tool to legitimise further inequalities — were these tools used by those running the show, IBM scientists or social scientists? It is hard to know because this is not stated clearly in the press release. Even the identity of those performing the analysis is beyond identification.
    → Can you even call this a proper analysis?
    → I understand that there are limitations in NLP, however this is not communicated well at all in the Keypoint software that IBM wants to commercialise. This is important to amend if this tool is to be indiscriminately used to justify decision based on incomplete data such as presented in ‘That’s Debatable’.
  3. Match Arguments to Key Points: It identifies how many arguments support each of the potential key points. It then selects a small set of key points that are diverse and cover the majority of arguments submitted — giving a percentage of the prevalence of each.
    → How can the key points be diverse? How is diversity calculated? Who programmed this diversity?
  4. Generates the narrative: The technology selects the key points cited most often in the submissions and a small subset of the strongest arguments that support each key point are used to create salient narratives arguing the pro and con side of the debate.
    → Is frequency a precursor of salient narratives? How is a compelling argument measured and aggregated?

According to this release

“IBM plans to commercialize Key Point Analysis inside Watson NLP products including Watson Discovery.”

They argue that businesses can gain a clearer view of relevant points and considerations.

A few thoughts on the top of my head:

  • What happens if this tool is used in a government without critical people involved in this process that can spot how flawed that data input actually is?
  • How can this process be better explained or communicated to people using the tool.

The current animation video simply shows frames being split and sorted:

Screenshot of marketing video from IBM press release

What if unequal wealth distribution is optimised?

This is also communicated by IBM in the same press release as:

“..helping states get critical voting information to citizens.”

From what I have seen there is a lot that needs to be considered and communicated for that to be the case.

That is why I conclude:

Watson NLP Key Point Analysis may simply exacerbate social inequality if used wrongly for social analysis, such as seems to be the case on That’s Debatable

That is what I currently think.

However, I would love to hear other arguments, or perhaps better understand how this keypoint analysis works.

What do you think?

This is #500daysofAI and you are reading article 496. I am writing one new article about or related to artificial intelligence every day for 500 days.

--

--

Alex Moltzau
Digital Diplomacy

Policy Officer at the European AI Office in the European Commission. This is a personal Blog and not the views of the European Commission.