Analyzing right-wing YouTube channels: hate, violence and discrimination

HMPig, DCC, UFMG
May 30, 2018 · 10 min read

Raphael Ottoni (@raphaottoni), Pedro Bernadina (@pdbernardina12), Evandro Cunha (@Cunha_et_al)

Reading and sharing news on Facebook, tweeting, watching cute cat videos on YouTube. All of these actions are ordinary parts of many people’s everyday lives, and are so mundane that most of us don’t even notice them. We’re living in a time in which information is being produced at incredible rates and people are more connected than ever. For instance, YouTube, the world’s most famous video-sharing website, has more than one billion users, with people uploading 400 hours of video every minute. Some of them, however, are making use of these technologies to spread misinformation and hate. Mostly due to the volume of content generated inside this service, identifying this kind of misbehaviour is proving to be a challenge.

In addition to this, it seems that political polarization is expanding in many parts of the world: in the United States of America, for example, disagreement between Republicans and Democrats has risen in the last years, as suggested by a report from Pew Research Center; also, European politics has never been so polarized; similar situations can be observed in developing countries, including Brazil.

As a consequence, we observe an increasing wave of right-wing activity, including far-right and alt-right extremism. According to the non-governmental organization Anti-Defamation League (ADL), “Internet has provided the far-right fringe with formerly inconceivable opportunities”. Videos such as the one entitled “Islam is NOT a Religion of Peace”, published by Paul Joseph Watson, are exactly what ADL is concerned about: opportunities for extremists to reach a much larger audience than ever before and easily portray themselves as legitimate.

Worried about this social phenomenon, our research group conducted an investigation on YouTube to evaluate and detect signs of hate, violence and discrimination present in a set of right-wing channels. The paper resulted from this study was presented at the 10th ACM Conference on Web Science (WebSci’18), held in Amsterdam, and was granted the conference’s Best Student Paper Award. It can be freely accessed here.

Why YouTube and what it has to do with right-wing?

YouTube is the major online video sharing website and enables people to upload videos that can be seen by wide audiences. It is also one of the virtual services that include lots of right-wing voices and, according to The Wall Street Journal, it is pushing extreme and misleading videos to its users. These facts make the YouTube platform a fertile ground for extremists to promote their agenda.

In this study, we collected information from a set of right-wing channels to be analyzed. To select these channels, we used the right-wing news website InfoWars as a seed. This website links to its founder Alex Jones' YouTube channel, which had more than 2 million subscribes as of October 2017. At the moment of our data collection, Alex Jones expressed support to 12 other channels in his public YouTube profile. We visited these channels and confirmed that, according to our understanding, all of them published mainly right-wing content. Then, we collected (a) the video captions (written versions of the speech in the videos, manually created by the video hosts or automatically generated by YouTube’s speech-to-text engine), representing the content of the videos themselves; and (b) the comments (including replies to comments) posted to these videos.

In order to compare the results regarding these right-wing channels to a baseline representing a more general YouTube behavior, we also collected videos posted in the ten most popular channels of the category news and politics according to the analytics tracking site Social Blade.

And how were hate, violence and discrimination measured?

We developed a three-layered approach to investigate this content from three distinct fronts: we performed a (a) lexical analysis, a (b) topic analysis and an (c) implicit bias analysis. These techniques are widely used in scientific literature, being applied on their own. Since they are essentially different, combining the three of them makes it possible to answer more complex questions, with each technique complementing one another. Combined, they allow us to answer the following research questions:

1: is the presence of hateful vocabulary, violent content and discriminatory biases more, less or equally accentuated in right-wing channels?

2: are, in general, commentators more, less or equally exacerbated than video hosts in an effort to express hate and discrimination?

First method: a lexical analysis

Lexical analysis, that is, the investigation of the vocabulary, reveals how society perceives reality and indicates the main concerns and interests of particular communities of speakers. To perform an analysis on the vocabulary used in our videos and comments, we used Empath, a tool for analyzing text across lexical categories. Words were classified among the following 15 categories related to hate, violence, discrimination and negative feelings and 5 categories related to positive matters in general:

  • negative: aggression, anger, disgust, dominant personality, hate, kill, negative emotion, nervousness, pain, rage, sadness, suffering, swearing terms, terrorism, violence
  • positive: joy, love, optimist, politeness, positive emotion

Second method: a topic analysis

Topic modelling is a type of statistical model for discovering underlying semantic topics (sets of words conveying a theme — e.g the words sand, sun and ocean conveying beach as motif) that occur in texts. It helps us to discover hidden topical patterns across the documents — in our case, video captions and comments — and to annotate these topics to each video, allowing us to better understand, analyze and organize them. The algorithm for topic modelling used in our work is called latent Dirichlet allocation (LDA). One of the drawbacks of this algorithm is the fact that it leaves to the user the task to interpret the meaning of each topic.

Third method: an implicit bias analysis

The Implicit Association Test (IAT) is a test designed to measure a person’s automatic association between concepts in memory. Its core idea is to measure the strength of associations between two target concepts (e.g. flowers and insects) and two attributes (e.g. pleasant and unpleasant) based on the reaction time needed to match (a) items that correspond to the target concepts to (b) items that correspond to the attributes (in this case, flowers + pleasant, insects + pleasant, flowers + unpleasant, insects + unpleasant). The authors found that individuals’ performance was more satisfactory when they needed to match implicit associated categories, such as flowers + pleasant and insects + unpleasant. Currently, there are online versions of several implicit association tests designed by researchers from the Project Implicity. Out of curiosity, we strongly encourage taking one of these to see how they work.

Words that compose each class and set of attributes in our Word Embedding Association Tests (WEATs)

More recently, an article published in Science proposed to apply the IAT method to analyze implicit biases based on vector spaces in which words that share common contexts are located in close proximity to one another, generated by a technique called word embedding. By replicating a wide spectrum of biases previously assessed by implicit association tests, they show that cosine similarity between words in a vector space generated by word embeddings is also able to capture implicit biases. The authors named this technique Word Embedding Association Test (WEAT) and we used it to design three tests focused on harmful biases towards the following minorities and/or groups likely to suffer discrimination in North America and Western Europe: immigrants, LGBT people and Muslims.

What did you find with this three-layered methodology?

First, regarding our lexical analysis, we highlight here the following findings:

  1. video captions, in general, contain more words from the categories rage, nervousness and violence than comments;
  2. on the other hand, comments tend to include more words from the the categories hate and swearing terms;
  3. right-wing video captions, when compared with our baseline set of channels, incorporate higher percentages of words conveying categories like aggression, disgust, hate, kill, rage and terrorism;
  4. baseline channels hold a higher percentage of positive semantic fields such as joy and optimism.
Normalized percentage of words in each Empath category. The bottom and top of the box are always the first and third quartiles, the band inside the box is the median, the whiskers represents the minimum and maximum values, and the dots are outliers.

To show the results of our topic analysis, we display here the top 2 most frequent topics and the top ranked 20 words produced by the LDA concerning right-wing and baseline video captions and comments.

Top 2 topics for each document (right-wing and baseline captions and comments). Inside each topic, 20 words are presented in order of importance according to the LDA output.

Among the top ranked topics for the right-wing captions, we observe a relevant frequency of words related to war and terrorism, including nato, torture and bombing, and a relevant frequency of words related to espionage and information war, like assange, wikileaks, possibly document and morgan (due to the actor Morgan Freeman’s popular video in which he accuses Russia of attacking United States’ democracy during its 2016 elections). Regarding the top ranked topics for the right-wing comments, it is possible to recognize many words probably related to biological and chemical warfare, such as rays, ebola, gamma, radiation and virus. It is also interesting to observe the presence of the word palestinian in the highest ranked topic: it might indicate that commentators are responding to the word israeli, present in the top ranked topic of the captions.

As expected, the words in the top ranked topics of the baseline channels seem to cover a wider range of subjects. The terms in the top ranked topics of the baseline captions include words regarding celebrities, TV shows and general news, while the ones in the baseline comments are very much related to Internet celebrities such as RiceGum and PewDiePie, and computer games, like Minecraft.

Distribution of WEAT biases for the three topics analyzed. Dashed lines indicate the reference value calculated from the Wikipedia corpus.

Finally, we compared channels’ implicit biases compared to the ones calculated for a general corpus collected from Wikipedia — as it is often considered, in this context, to be a good representation of contemporary English. When contrasting the reference Wikipedia bias with the biases calculated for the channels collected by us, we observe different trends depending on the topic. For instance, the bias against Muslims was almost always amplified when compared to the reference, especially for video captions. On the other hand, the bias against LGBT people was weakened in most of the observed channels, even in the right-wing ones. Concerning the bias against immigrants, the values appear close to the reference.

Comparing biases in captions with biases in comments, it is interesting to notice that, for immigrants and Muslims, captions hold higher biases than comments in 75% of the right-wing channels, considering the statistically significant cases. For LGBT people, however, comments hold higher discriminatory bias in right-wing channels. Comparing biases between right-wing and baseline channels, we observe that, concerning Muslims, the captions of right-wing channels present higher biases, while for the other topics the differences were not very pronounced.

Summarizing, our most interesting findings concerning the implicit bias analysis are:

  1. the YouTube community seems to amplify a discriminatory bias against Muslims and weaken the bias against LGBT people;
  2. there are no differences between right-wing and baseline captions regarding immigrants and LGBT people, but there are against Muslims.
  3. regarding biases against immigrants and Muslims, in 75% of the right-wing channels the comments show less bias than the captions.
  4. On the other hand, biases against LGBT people is greater on right-wing comments than right-wing captions.

Analyzing all layers together

Combining the results of each analysis, we can finally answer our research questions:

1: is the presence of hateful vocabulary, violent content and discriminatory biases more, less or equally accentuated in right-wing channels?

Our lexical analysis shows that right-wing channels, when compared with baseline channels, incorporate higher percentages of words conveying semantic fields like aggression, kill, rage and violence, while baseline channels hold a higher percentage of positive semantic fields such as joy and optimism. Even though the most frequent LDA topics do not show high evidences of hate, they did report that right-wing channels debates are more related to subjects like war and terrorism, which might corroborate the lexical analysis. Also, the implicit bias analysis shows that, independently of channel type (right-wing or baseline), the YouTube community seems to amplify a discriminatory bias against Muslims, depicted as assassins, radicals and terrorists, and weaken the association of LGBT people as immoral, promiscuous and sinners when compared to the Wikipedia reference. We might conclude, then, that hateful vocabulary and violent content seems to be more accentuated in right-wing channels than in our set of baseline channels, and also that a discriminatory bias against Muslims is more present in right-wing videos.

2: are, in general, commentators more, less or equally exacerbated than video hosts in an effort to express hate and discrimination?

The lexical analysis reports that comments generally have more words from the semantic fields disgust, hate and swearing terms, and captions express more aggression, rage and violence. Regarding biases against immigrants and Muslims, in 75% of the right-wing channels the comments show less bias than the captions. On the other hand, although the implicit bias against LGBT people in YouTube is generally lower than in the Wikipedia reference, it is greater on right-wing comments than in right-wing captions. Our conclusion is that, in general, YouTube commentators are more exacerbated than video hosts in the context of hate and discrimination, even though several exceptions may apply.

What is next?

It would be great to also analyze left-wing channels and then compare the results with the ones presented in our paper. Even though it seems that left-wing voices are much lower in YouTube (there’s no substantial amount of active YouTube channels, with a good number of subscribers and views, aligned with the left), we plan to investigate similarities and differences between YouTube behavior of supporters in different parts of the political spectrum.

The other authors of the paper published at the Proceedings of the 10th ACM Conference on Web Science (WebSci’18) are Gabriel Magno (@gabrielmagno) , Wagner Meira Jr. (@wagnermeirajr) and Virgilio Almeida (@virgilioalmeida).

Nikki Bourassa, Ryan Budish, Amar Ashar and Robert Faris, from the Berkman Klein Center for Internet & Society at Harvard University, played an important role discussing our methodology and findings.

HMPig, DCC, UFMG

Written by

Hate, Misinformation and Polarization Interest Group @ DCC, UFMG, Brazil. We do quantitative research for a better web.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade