Islamic State Extremists Are Using the Internet Archive to Deliver Propaganda

The Internet Archive (IA) is best known for the Wayback Machine, a massive web archive. In actuality, the scope of IA is much broader, also including television broadcasts, ebooks, and even vintage video games. Less well known is that anyone can upload anything to the Internet Archive:

So, for example, George Washington (GW) Libraries has uploaded digitized copies of The Cherry Tree, GW’s yearbook. The Internet Archive serves the uploaded files. While intended for archival purposes, in essence, IA services as a web delivery platform.

ISIS sympathizers are using the Internet Archive as a web delivery platform for extremist propaganda, posing a threat to the archival mission of IA.

Let me back up …

GW’s Program on Extremism (POE) studies violent and non-violent extremism. One area of focus for the POE is the Islamic State of Iraq and Syria (ISIS). To this end, the POE has collaborated with GW Libraries’ Social Feed Manager team to collect the tweets of ISIS sympathizers. POE researchers also collect the postings of ISIS sympathizers on Telegram, but using a different methodology.

Last October, the POE released “Digital Decay: Tracing Change Over Time Among English-Language Islamic State Sympathizers on Twitter”, a report summarizing their finding from the dataset of tweets by ISIS sympathizers, authored by Audrey Alexander. The report noted the Internet Archive was one of the top “file sharing services” that was referenced in links within the tweets. In presenting the results, it was also noted that the Internet Archive was showing up towards the top based on initial research with Telegram posts. I tweeted about this:

At the time I merely assumed that some ISIS-related content was being swept up in web captures and then being referenced in the tweets of ISIS sympathizers.

More recently, Bennett Clifford, one of the POE researchers tweeted about updated finding from Telegram. Here he noted that IA had become the top file sharing site.

[For more information, see this overview of the Telegram Tracker.]

This time my interest was piqued. Bennett shared some screen shots from Telegram, which contained some links to IA. Here’s one:

Bennett provided some additional context on this post. It “contains a video release from the al-Hayat Media Center, one of the principal ISIS media outlets, usually responsible for releasing video content. When major releases are disseminated through Telegram, they are shared using a number of platforms and links so when one link is removed for violating content restrictions, another one is available. IA is one of the sites used frequently to host major media releases, which may explain why it shows up so often on Telegram.”

The screen shots showed my earlier assumption was entirely incorrect: ISIS sympathizers were linking to uploaded content, not web archives.

For additional evidence (and with the permission of the POE), I returned to the Twitter data collected with Social Feed Manager that was used to create the “Digital Decay” report. The dataset is “845,646 tweets produced by 1,782 English-language pro-IS accounts from February 15, 2016 to May 1, 2017.” I unshortened all of the URLs (using Twarc’s unshrtn.py script), extracted only the URLs for “archive.org”, and normalized the URLs (for example, https://ia800207.us.archive.org/27/items/naba20/naba20.pdf was normalized to https://archive.org/details/naba20). After normalization, there were 513 unique IA URLs. The top URLs and the number of tweets that referenced them are:

IA URLs constitute 1,654 of 129,042 non-unique, non-Twitter URLs in the dataset.

A few important considerations for contextualizing this claim:

  • I have not evaluated the content itself to determine if it is extremist in nature. Much of the content is in Arabic and I have no subject matter expertise.
  • I don’t know anything about who is uploading content to IA and whether they have any sort of a relationship with the authors of the tweets.
  • Not all content that is used as propaganda by ISIS sympathizers is extremist in nature. In fact, many of the other URLs in the dataset link to news stories in mainstream Western media that happen to support the extremist narrative.
  • IA’s website does provide statistics on the number of views of each item. However, I haven’t evaluated the view numbers and I’m unclear how the number of views relates to downloads.
  • IA does take down uploaded files due to content issues and, as shown below, some content is geo-restricted and/or requires login. I don’t know whether there is an official policy, how the content review / takedown process currently operates, or how often content is removed. (I couldn’t find any information on their website, but might have missed it.)

However, regardless of the above, it is deeply concerning that (1) ISIS propaganda is referencing content that is being served by the Internet Archive and (2) the role of IA as a web delivery platform for extremist propaganda seems to be increasing. As explained by Samantha Weirman and Audrey Alexander in the forthcoming “Hyperlinked Sympathizers: URLS and the Islamic State,” the use of file sharing services such as IA allow extremists to “separate the content producers, disseminators, and consumers from the material itself.” Further, it allows extremists to continue to spread propaganda as other publishing platforms are blocked. Clearly, the role of IA in providing a web delivery platform for extremist propaganda demands further consideration, investigation, and discussion.

Full disclosure: I am one of the Internet Archive’s biggest advocates. If this issue is not properly addressed by IA, I see it as a potential existential threat. The recent experiences of Facebook, Twitter, Reddit, and the like have made it clear that a web delivery platform cannot absolve itself of responsibility for what is published on it and the misuse of IA for spreading extremism is only likely to increase, especially as these other platforms clamp down. As evidence of how seriously this is being taken, Google, Facebook, and Twitter were hauled before the Senate Commerce Committee on January 17 to address extremist content on their site.

These are all murky, uncomfortable waters for an archive; clearly IA has taken some measure to perform appraisal on the content it collects. And there may be legitimate reasons for archiving this content that counterbalance the harm from its availability as propaganda. However, I would urge that this demands further debate and both that IA’s efforts be more public and that these efforts be scaled to address this growing threat.

Note: This post was shared with the Internet Archive in advance of publication.