Uncovering YouTube: a New Way to Track Radical Communities

Erik van Zummeren
RadiTube
Published in
6 min readOct 30, 2020
Illustration by Slava Kuchinka

In 1872, Karl Marx held a meeting with the International Workingmen’s Association. It happened in a shabby dance room in a working class district in The Hague. A lot of attendants weren’t actually working class members, but undercover police agents and concerned politicians, fearing the potential damage that his so-called “gang of bandits” could inflict on the country.

If Marx had lived in 2020 he might have just started a YouTube channel or been a guest on the Joe Rogan Experience. It’s how Jordan Peterson and Ben Shapiro got their massive following and managed to polarize opinions all over the world. However, unlike Marx’ case, these new public debates, especially more marginal hate speech and white supremacy conversations happening on the edges of YouTube, are much harder to keep track of. I am not saying that police should monitor them like they did in the 19th century, but these discussions should be as accessible for members of civil society as possible.

RadiTube: a search engine for extreme content on YouTube

Say you want to find all the videos that are mentioning QAnon-related conspiracy theories, compare their context, and understand how these concepts evolve over time. The current YouTube search functionality is limited, and a lot of these videos are shown to the user through the recommendation engine. So, if you are not deeply involved in the research of these communities already, the algorithms will simply not show you all the relevant results.

Also, the number and the length of the videos, which can run up to 3 hours, make it impossible for one person to watch and analyze. In addition, much of the content mentioned in these very lengthy and poorly edited videos is often low in information density.

That’s why I created a search engine called RadiTube that scrapes the (automated) subtitles of these more radical and polarizing YouTube channels.

It currently contains around 170,000 videos of 350 channels, and every hour it checks for new videos on these channels. It allows you to search for specific words or combinations and links your search to the specific videos, letting you immediately go to the exact moment in the video.

Example of searching the word “reptilians”. The 30 seconds footage example below shows the user interface all the clips containing the word ‘reptilians’.

You can also try it yourself by searching for the word “biblical times”, “vaccines”, “biden is a”, “q says” or whatever word(s) you would like to search for.

Use cases

The aim of this tool is to provide journalists and researchers a way to get a better qualitative and quantitative understanding of the conversations and debates that are happening on YouTube.

  • It shows what content got deleted. Over the last few months, YouTube has been removing a lot of channels. However their moderation policy is opaque and ad hoc. This tool can create a better understanding of how their moderation policy currently works.
  • These removed videos had a big influence on our society. Yet any trace of them got removed. Future historians who want to better understand these turbulent times that we live in and examine the actual source material have little to work with. This tool also serves an archival function and keeps a transcript of this removed content.
  • It can show you how the discourse within these communities is changing over time, how certain concepts evolve, and how the alternative health channels, for instance, influence conspiracy channels and vice versa.
  • Related to that, the tool can also show upcoming trends quantitatively. With the help of an n-gram you can figure out in which time periods words come into fashion (see the image below). You can then watch the video clips that are mentioning the search query in that specific time period.
N-gram looking for words that are based on the conspiracy theory NESARA/GESARA
N-gram looking for words that are based on the NESARA/GESARA conspiracy theory. The term got mentioned more often from March 2020 and onwards.

The search engine is built on top of Elasticsearch, which means that the search capabilities are quite refined and flexible. You can search different communities, search between specific time ranges, or look for words in videos that have been removed by YouTube.

Through the usage of search operators you can define very flexible search queries:

SUPPORTED SEARCH OPERATORS"5g technology" 
—— searches for clips that specifically contain the combination 5g technology.

"global order"~3
—– searches for clips that can contain up to 3 other words between global and order. For instance global masonic order.
gates -bill -foundation
–– searches for clips that contain the word gates, but not the words bill or foundation.
d*gs
—— the asterisk serves as a wildcard. It will show clips containing words such as dogs, drugs and dirtbags. Especially useful if you are looking for words that aren't recognized very well by the speech recognition software (see current limitations).
covid~
–– a fuzzy search, like the wildcard this one is also useful for words that are poorly recognized. In this case it will return results containing the word covid, but also words as kovid, convid and clovid.
epstein user: "Angelo John Gage"
–– will search for all the clips where the channel of Angelo John Gage is mentioning Epstein. It's important to know that channel name is case sensitive.
Video showing additional search functionalities

Current limitations

Currently the list of indexed channels is curated based on academic papers, aggregation websites and YouTube channels that are related to each other. For many researchers it will be important to curate these channels themselves. If you are interested in such features, feel free to reach out to evz213@nyu.edu.

Another limitation lies in the automated subtitles that are generated by YouTube. Even though the quality of the subtitles is very high, it sometimes stumbles on specific words that it doesn’t recognize. For instance, in the case of the earlier mentioned NESARA/GESARA conspiracy theory (see n-gram), the word ‘GESARA’ can be written as ‘josara’, ‘ jasara’ and ‘jussara’. There are some ways to work around this, such as writing an asterisk before the ‘sara’ part (*sara), or doing a fuzzy search by placing a ~ behind the word (gesara~).

Future

The current release of RadiTube can be considered an early version. Because of the upcoming US presidential elections, and the use that a tool like this could provide in understanding the surrounding noise, I thought an early version would do for now. The software still contains plenty of bugs, and it will also be interesting to see how the infrastructure will hold up in the days around the release.

There are a couple of important features that are still in the making, such as a way to track comments and to get notified when a video contains a specific phrase (similar to Google Alerts). Cameron Ballard is also working on a feature that shows you which videos have on- and offsite monetization.

You can follow the project on Twitter to get notified of upcoming updates. If you have any specific questions or suggestions feel free to reach out to me at evz213@nyu.edu

Acknowledgements

I’m super grateful to my friend Elena, who helped me out big time on the editing of this article, to Cameron and Robert-Jan who made important contributions to the project, and to all my other wonderful friends who have been so supportive on this project. Also a special thanks to the fine people at NYU ITP, in particular the Tech, Media & Democracy class (specifically Justin Hendrix, Irwin Chen and Mor Naaman), and the Project Development Studio by Danny Rozin.

I’m also very thankful to the SIDN Fonds, Scrapinghub, and DigitalOcean for supporting this project. And to NYU ITP, the Netherland-America Foundation and the Fulbright program who have been very supportive of my stay here in New York.

--

--