MIT’s Automatic Data-Driven Media Bias Measurement Method Achieves Human-Level Results

Published in

SyncedReview

3 min readSep 13, 2021

Today more than ever, people are voicing concerns regarding biases in news media. Especially in the political arena, there are accusations of favouritism or disfavour in reporting, often expressed through the emphasizing or ignoring of certain political actors, policies, events, or topics. Many regard this as a corruption of the fourth estate and a rising threat to democracy.

Is it possible to develop objective and transparent data-driven methods to identify such biases, rather than relying on subjective human judgements? MIT researchers Samantha D’Alonzo and Max Tegmark say “yes,” and have proposed an automated method for measuring media bias.

In the new paper Machine-Learning Media Bias, the team analyses roughly a million articles from about one hundred newspapers for bias across various news topics, mapping the newspapers onto a two-dimensional media bias landscape. The proposed data-driven approach produces results that are in close accordance with human-judgement classifications on left-right and establishment biases.

The researchers start by describing how they automatically map phrases, meaning monograms, bigrams/trigrams and newspapers into a bias space using phrase statistics alone. The proposed method employs a generalization of principal component analysis tailored for phrase frequency modelling. Given a set of articles as inputs, it counts the occurrences of phrases and arranges these counts into a matrix, aiming to model this matrix in terms of identifiable biases that link phrases and newspapers.

The researchers scraped and downloaded a total of 3,078,624 articles published between January 2019 and December 2020 from 100 media sources that include a broad diversity of political stances. In the data-preprocessing stage, they auto-classified the articles by topic using the open-source MITNewsClassify package. They purged phrases to avoid duplication, and removed phrases when more than 90 percent of all occurrences were from a single newspaper. This step pared the total to 1,000 candidate phrases with the highest information scores for further analysis.

They then mapped media bias onto a two-dimensional landscape — traditional left-right bias and establishment bias — based on the use-frequency of the discriminative phrases.

The results show that the proposed automatic media bias classification method agrees well with previous bias classifications based on human judgement. The team hopes their study can make discussions on media bias less politicized: “Although news bias is inherently political, its measurement need not be.”

The paper Machine-Learning Media Bias is on arXiv.

Author: Hecate He | Editor: Michael Sarazen, Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

MIT’s Automatic Data-Driven Media Bias Measurement Method Achieves Human-Level Results

Written by Synced