The artificial intelligence solution
As trust in news plummets, cable news outlets CNN, Fox News and MSNBC are being criticized for operating through the prism of bias. If this is the case, how are biases in cable news programs contributing to the rise in polarization in the U.S.? Can artificial intelligence (AI) be used to monitor and analyze bias?
The Financial Times has developed a bot that automatically flags whether their articles quote too many men. The BBC has started measuring how many female experts get on the air. These two initiatives aim to balance gender representation in the news by training journalists to consciously craft more inclusive stories. Are there also ways to encourage less biased reporting in cable news?
To begin to answer this question we explored whether AI can do these three things: identify coverage patterns, analyze bias in a way that is efficient on large data sources, and be used in a scalable way across a variety of topics and parameters. The list of possible editorial biases to examine is long. For the purposes of this project we used the prism of only a single topic. (Note: the research team members are listed at the end of this article.)
Phase I: Identify a Topic and Reliable Data
The first stage was to identify a single topic and then develop an AI tool that can identify instances of coverage of that topic on cable news.
To pick a topic we used polling studies from the Pew Research Center, a nonpartisan fact tank, to see which issues were top concerns for voters affiliated with both American political parties during the 2016 and 2018 national elections. Immigration was one of the topics that dominated headlines in both elections.
Top Election Topics 2016: economy, terrorism, foreign policy, health care, gun policy, immigration, Social Security, education, Supreme Court appointments, and treatment of racial and ethnic minorities.
Top Election Topics 2018: supreme court appointments, health care, economy, gun policy, Medicare, Social Security, taxes, immigration, treatment of racial and ethnic minorities, and environment.
Once the topic was selected we created a large data set to analyze using cable news video (Fox News, MSNBC and CNN) and corresponding transcripts from the TV News Archive for the four months leading up to the 2016 and 2018 elections.
Phase II: Building a Lexicon and Verification
Our next task was to identify instances in video clips and audio transcriptions when cable news channels covered immigration as a topic. This required writing a topic modeling algorithm to create a lexicon, and then test it with code refinement and manual verification.
A topic lexicon represents a list of words frequently co-occurring with the keyword of interest, in this case “immigration.” We can classify transcript segments as being about immigration when enough of these meaning loaded “signal words” appear. For example, the words “reform” and “illegals” mentioned in close proximity can indicate that immigration is being discussed, even if the word “immigration” is never explicitly mentioned. It is important to note that while these terms appear in cable news transcripts — “illegal immigrant,” “illegal alien,” and “illegals” — they are inaccurate terms by legal and journalistic standards.
To generate a lexicon for a given topic we lemmatized our transcripts and then tokenized the contents, to look at words individually. Lemmatization refers to the process of removing inflectional endings and returning to the base form of a word, which is known as the lemma. This facilitated counting the occurrence of “reform” and “reforms,” for example, as identical. We used SpaCy, the natural language processing (NLP) library, to identify the most frequently co-occurring words with our topic word in the lemmatized transcripts and inserted them into a ranked lexicon. Tokenization is the task of chopping content up into pieces, called tokens, and perhaps at the same time throwing away certain characters, such as punctuation, that have no extrinsic or relevant meaning or value to the question. For instance, commas, hyphens, periods, etc.
Words were classified as co-occurring with a target word based on a “mutual information” score. A co-occurrence was defined as a word occurring within a certain number of words of an instance of the topic word. The mutual information score was computed as such: (number of co-occurrences of word with topic word)/(total number of occurrences of word in transcript x total number of occurrences of topic word in transcript). This allowed for frequently occurring words such as “and” to be filtered out of our lexicon.
The second part of topic identification is a scoring technique that allowed us to take an excerpt of text and assign it a score based on the number of words in the lexicon that appeared in the segment. The appearance of a word that was ranked more highly in our lexicon had a greater score contribution than a word that appeared at a lower rank in the lexicon. Using this scoring technique, we broke the transcript into small overlapping segments. Next, we iterated through the segments in the transcript and assigned each segment a score. If the score exceeded a certain threshold, the segment classified as talking about our topic. Combining overlapping segments, we effectively compiled a comprehensive list of transcript excerpts of various lengths that addressed our target topic.
This was an iterative process of code refinement and manual verification. This was especially true when choosing the points that each lexicon word should contribute, and the corresponding score threshold for classifying an excerpt as “about immigration.” Manual readings of transcripts were compared to computer-generated outputs, until we identified values that allowed us to precisely identify relevant transcript portions.
To test our methodology on a larger data scale we started by looking at when immigration coverage spiked in 2016 on the three cable news networks.
Were there any notable differences between the coverage of immigration as a news topic in 2016? No, the pattern of coverage is relatively consistent between each cable news channel. More importantly for the purposes of this study, the computational findings match the events of the day.
What about 2018? When did immigration receive the most coverage? Here the results also demonstrate that the computational findings match the events of the day.
The results of the coverage patterns and their precise agreement with coinciding historical events demonstrate that this methodology works. To test it further we ran the same test on other election topics. Those results were also accurate.
Phase III: Analyze and Compare Coverage of Immigration
At this stage, we looked at news coverage in a manner that was more nuanced than simply computing the frequency of coverage of a particular topic. Here, we sought to identify how the topic was being covered. In other words, we examined the way that language was used to convey emotion and editorial bias.
This technique was used to measure the quantity and degree of positive or negative the words used during discussions of the topic, in this case, immigration. The goal of this algorithm is to gain a better understanding of the attitudes and emotions expressed. Sentiment analysis measures the tone of each word, and assigns a corresponding value. In the charts below, positive language received a positive sentiment score and negative language received a negative sentiment score.
Sentiment scores peaked on August 31 and September 1, 2016. These peaks occurred the day before and the day of campaign speeches on immigration by presidential candidate Donald Trump. As a partisan cable news network that is demonstrably right-leaning, it is predictable that Fox would have positive sentiment scores. The results for MSNBC and CNN were more surprising. These two cable news networks had positive scores as well, even though their reporting on the topic had a negative tone on these dates.
Here is an example of a transcript selection from that two-day period.
“UNTIL TRUMP ENDS THE EARLY TRANSITION OF AMERICA FROM THE GREATEST NATION IN HISTORY INTO SOME PATHETIC THIRD RATER ALSO-RAN, MULTICULTURAL MESS, UNTIL BLEEDING HAS STOPPED, THERE IS NOTHING TRUMP CAN DO THAT WON’T BE FORGIVEN. EXCEPT CHANGE HIS IMMIGRATION POLICIES. AT LEAST ONE PERSON FOUND THE TIMING OF YOUR BOOK RELEASE SO FUNNY. I’VE WATCHED THIS 20 TIMES. YOU HAVE TO WATCH WITH IT ME NOW.
WHO KNEW THAT IT WOULD BE DONALD TRUMP TO CONVERT THE GOP BASE TO SUPPORTING AMNESTY ON THE SAME WEEK ANN COULTER’S BOOK —
HE EITHER FEELS SORRY FOR YOU OR HE’S LAUGHING AT YOU.”
Sentiment score: 1.1568
In this clip, there are mixed linguistic messages. The tone and message are negative, but five positive words (highlighted in bold) resulted in an overall high positive sentiment score. We repeatedly found that positive sentiment scores correlate with high-emotion moments, but they are less effective as predictors of whether those moments have a positive or negative tone or meaning.
Measuring High Inferences in Language
Our next objective was to find a more nuanced way to detect and quantify bias for purposes of analysis. Bias, like other nuanced semantic qualities, is notoriously difficult for computers to recognize. Language is nuanced. The challenge then is to find a way to account for linguistic nuance. In the next phase of our analysis, we attempted to quantify subtle and not-so-subtle biases reflected in tone and meaning.
We developed a high-inference analysis algorithm to iterate through sections on a given topic and compute a bias score of how many impartial words appeared in the topic sections. This method for detecting bias also used a system of lexicons. First, we manually created a database of written words from print news and national organizations. This lexicon had limited success with identifying relevant sections from the transcripts because what we write is different than what we say. Written words are often different than spoken words, and consequently, they are not necessarily used on air. To capture the tone and attitude of language used on television cable news specifically, we also needed to identify the high-inference words used most frequently with immigration by CNN, Fox News and MSNBC.
We wrote an additional algorithm to see which words appear exclusively in the lexicon of each channel, to get a better understanding of the differences in the language used between channels.
Television Cable News High-Inference-Language Lexicon Word Clouds
These word clouds reflect some of the most frequent high-inference words used with immigration exclusively by each cable news network in 2016.
We combined these two lexicons, sentiment analysis and high-inference analysis, to generate a left-biased and a right-based lexicon to measure bias in reporting on the topic of immigration.
At this point, it is important to note that we wrote a bias classifier just for the topic of immigration. Writing a bias classifier for a different topic would need human input for lexicon generations. Also, this bias analysis was constrained to political bias.
Bias Analysis Using Two High-Inference-Language Lexicons
We created two lexicons to identify coded and biased language. As noted previously, our first lexicon relied on written sources. Stark language usage differences caused this technique to be limited in analyzing transcripts. We solved for this by combining manually identified high-bias words from news articles about immigration and combining them with words that are used most frequently by CNN, Fox News and MSNBC near the word immigration. Here are samples of two lists of words starting with the letter “A.”
Left-biased words when used with immigration:
Right-biased words when used with immigration:
A bias score was generated by measuring the usage of these high-inference-language words. An instance of a “left-leaning” biased word would push the score in one direction, and an instance of a “right-leaning” biased word would push the score in the opposite direction.
Here are two examples of the results of identifying bias based on the high-inference-language lexicon:
Right-biased wording when used with immigration
THE PRIORITIES IS TO GET RID OF CRIMINAL FELONS, ILLEGAL ALIENS WHO HAVE COMMITTED CRIME INSIDE THE COUNTRY. NUMBER TWO ENFORCE THE IMMIGRATION LAW, AND NUMBER THREE TO PROTECT THE BORDER. THAT’S CONSISTENT.
Fox News, “Shepard Smith Reporting” 08/26/2016
Left-biased wording when used with immigration
NOW THE MAN WHO LAUNCHED HIS CAMPAIGN, SLURRING MEXICAN IMMIGRANTS WHO CALLED FOR A WALL ON OUR SOUTHERN BORDER, WHICH MEXICO WILL PAY FOR, AND DEPICTS UNDOCUMENTED IMMIGRANTS IN THIS COUNTRY AS A LAWLESS HOARD, TERRORIZING CITIZENS, HE APPEARS TO EMBRACE ELITE REPUBLICAN ORTHODOXY AND OBAMA POLICY ADMINISTRATION, WHETHER HE KNOWS IT OR NOT.
MSNBC, “All In With Chris Hayes” 08/24/2016
The results of running our scoring algorithm on an extended database of transcripts indicate that the bias lexicons work.
Here are right- and left-bias scores for each channel using two high-inference-language lexicons:
Fox was correctly identified as largely right-leaning, while CNN and MSNBC were accurately identified as more left-leaning. Our tool enables powerful opportunities for bias identification. Language bias is a hard problem, as it represents a subjective task and linguistic cues are often subtle. Not only that, bias can often only be determined through context. There are many opportunities for further analysis and expansion in this realm.
What We Learned
- Extracting information and identifying topic segments based on word frequencies.
- Identifying high-emotion moments using sentiment analysis.
- Using bias analysis to determine trends in political leanings.
- Computer-aided comparisons of coverage patterns of different topics.
- Sentiment analysis is not an effective indicator of positive or negative tone, due to limitations in contextual analysis.
- Manually generated lexicons are necessary for more accurate bias analysis results, but can often be influenced by the biases of the human programmer.
- Bias lexicons are difficult to create since people from opposing perspectives often use the same words.
- Lexicons need to be constantly updated with new words and phrases as news events evolve and change.
- Lemmatization makes the initial process of lexicon generation slow.
- Analyze other topic biases including climate change, abortion, voter suppression, criminal justice, etc.
- Analyze more coded biases including gender, race, ethnicity, etc.
- Improve the classification methodology by developing additional automated ways to parse syntactical and semantic characteristics of potentially biased language.
- Develop specific measurements to quantify partiality or impartiality
- Develop a user-friendly interface that easily shares results with transcripts and video clips.
The supervised classification approach used in this methodology relies on a combination of manually and automated identification of bias words. The lexicon we generated is a data set of biased words which can be used for further research in detecting language bias. We show that our approach, trained with more explicitly biased content, is effective with language known to be clearly biased, as well as, where the language biases are subtle.
The goal of this particular project was to conduct an exploration of whether an AI tool could be used to effectively analyze cable television news for patterns and trends in content, bias and coverage. These preliminary results indicate that the perceived ideological biases of American television cable news networks can indeed be identified and quantified.
I believe that we can gain a lot from cross-discipline collaborations and the establishment of best practices. Do you have recommendations on how we might we use AI to monitor and analyze bias, so that ultimately we improve our editorial judgment and foster a culture of inclusive reporting? I want to hear from you. DM me on Twitter @geraldinemoriba or send an email to email@example.com.
Acknowledgements: This research came about as a result of Stanford University’s Exploring Computational Journalism course. Maneesh Agrawala, the Director of the Brown Institute for Media Innovation, and Will Crichton, a computer science Ph.D. student, were our mentors. Our team consisted of two computer science students (Charlie Jarvis and Theodora Boulouta) and two John S. Knight Fellows (Flor Coelho and me, Geraldine Moriba).