Watch six decade-long disinformation operations unfold in six minutes

Published in

The Startup

11 min readJan 26, 2020

Here’s a bird’s eye view of six state-backed information operations on Twitter, and how they evolved over the last decade. This research was funded by the Mozilla Foundation by an Open Source Support Award.

I remember the moment when I found out that Twitter had released millions of tweets that were tied to a Russian disinformation campaign, which may have effected the outcome of the 2016 US Presidential election. I was in a bar, in a backroom full of disinformation experts, and the datasets have shaped my research ever since. The multiple bombshells of Twitter data which followed were the most fruitful datasets a social data scientist with an interest in disinformation could ask for.

Fast forward a year, and Twitter has released 10 GB of tweets from 11 countries (and counting) which they have attributed to “state-backed information operations” on their platform. Researchers have analyzed the datasets individually, and I myself conducted a temporal network analysis on the first Russian datasets with Dr. Charles Kriel for the NATO Defense Strategic Communications Journal. Each in-depth analysis has served an important role in understanding how each country has individually curated processes for manipulating public opinion and information. But, as I dove further in, I couldn’t help but wonder how these individually evolving networks of information manipulation would look side-by-side and what this could tell me.

The Question

In an effort to see the forest from the trees, in this article I will present high-level visualizations of six state-backed disinformation operations on Twitter. I will periodically dive into parts I find interesting, but will leave the granular analysis for other projects and researchers. Hopefully my visualizations will spark further ideas about the strategies a country, or suite of countries, may have deployed. Maybe you will see something in my visualizations that confirms or contradicts your own knowledge. If so, I’d love to explore it with you further.

Work like this takes time and resources, and I would like to thank the Mozilla Foundation for generously granting me a Mozilla Open Source Support Award to conduct this project.

I’ve boiled down a year of curiosity and exploratory analysis into the following question: what are the similarities and differences between the evolving structures of six state-backed information operations on Twitter? It’s a strange question that can’t be answered with just any data science method, or even suite of methods.

The (method to the) answer: temporal network analysis. The process of mapping out relationships within a dataset is called network visualization, and has been used by social scientists and criminologists to understand human interactions from friend networks to crime networks. Patterns in human interaction are inherently dynamic, as they change as time passes. Factoring in that element of time unfurls a static network into a temporal one. I have chosen to conduct a temporal network analysis on six countries because I believe that it is the best way to synthesize the rich, nuanced, and complicated patterns in the data. For background on temporal network visualization itself and how it was used in this project, I’ve created this short video explainer:

What I Found

All six datasets began their activities around the turn of the last decade, and have shifted languages, structures, and hashtags. Some countries mostly stayed focused on the languages of their own countries (Egypt & UAE, Ecuador), while the rest (Russian IRA, Venezuela, Iran, China) pushed beyond their own country’s main language to tweeting in others, such as English and Indonesian. Most datasets began with steadily quiet amounts of tweeting, and graduated to deploying multiple bursts of hashtag use (when a large amount of hashtags are used at once for a period of time). Often, operations began by tweeting innocuous hashtags in order to build their presence, such as #followme,#felizmartes (‘happy Tuesday’), #noticias (‘attention’), or #news before graduating to more political hashtags in loud bursts of tweets such as #deleteisrael, #blacklivesmatter, #maga, #HongKong, and #إردوغان (‘Erdogan’).

Venezuela, Ecuador, and Iran deployed newer accounts in their hashtag bursts while Russia, China, and Egypt & UAE deployed old ‘sleeper’ inauthentic accounts (which may potentially have been purchased) alongside newer accounts. The countries who chose to create new accounts for their hashtag bursts may have been saving their older, more established, accounts for future opportunities in their disinformation operations.

The Datasets

The data I have crunched was all open-source and downloaded from Twitter’s Election Integrity Hub. According to Twitter, the data originated in China (three datasets), Venezuela (three datasets), Ecuador (one dataset), Russia (the Internet Research Agency, one dataset), Egypt & UAE (one dataset), and Iran (one dataset analyzed in this study).

Each of the datasets underwent a standardized process towards the final visualizations. Per country, I:

Combined separate sets into one dataset
Selected only tweets which contained a hashtag and extracted metadata
Visualized a network of inauthentic account — to — hashtag relationships in Gephi (a random sample of 300,000 relationships were used due to software restrictions)
Unfurled the network to play over time, and recorded it as a video

I chose to colour the lines which connected inauthentic users to hashtags by the age of the account in order to visually display the foresight that may or may have not gone into the hashtag bursts throughout the operations. In my previous research into the Russian IRA dataset, my co-author and I found that the year an account was created was the greatest indicator of how it behaved over time. This showed the high amount of organization and foresight the IRA exhibited, which I hope to compare in depth to other country’s disinformation operations in future research.

The following section will outline each of the six disinformation operations as temporal network visualizations. I like to think of the networks as a birds eye view of state-backed employees hiding behind the curtains masquerading as honest people or groups who simply have something to say, as I believe this perspective breathes life into the datasets.

The Temporal Network Visualizations

China

Total number of tweets: 10,241,545 (16% contained hashtags)

Number of languages of tweets: 53 (top three were English, Chinese, and Indonesian)

In August 2019, Twitter specially released information on a Chinese disinformation operation which included an attack on Hong Kong protesters. The accounts released “ were deliberately and specifically attempting to sow political discord in Hong Kong, including undermining the legitimacy and political positions of the protest movement on the ground”, and engaging in spammy activities.

In the video of the network, it can be observed that Chinese inauthentic accounts had been engaging in seemingly innocuous behaviour since their creation in as early as 2009. While the oldest (green) accounts were active and building their presence by tweeting hashtags #follow and #travel for a decade, the humans who were running this operation chose to deploy new (grey) accounts to tweet the Hong Kong protest related hashtags in 2019 alongside the older accounts. The old (green) ‘sleeper’ accounts they did not deploy during the protests were potentially being groomed for another use, but the burst of #HongKong related activity ultimately brought the entire network down.

Venezuela

Total number of tweets: 12,070,658 (40% contained hashtags)

Number of languages of tweets: Undisclosed by Twitter (top-used hashtags were in Spanish, and few in English)

Twitter released three separate datasets which it deemed to be “ targeting audiences with in Venezuela and abroad”. Spanish language hashtag use was prevalent in the network during the entire period (2010–2019), while English language hashtag use was prevalent in 2017 and appeared to be about Trump and the 2016 US election.

Throughout the evolution of the Venezuelan disinformation operation, accounts were created and deployed, and were not resurrected after going quiet (they did not use ‘sleeper’ accounts). The operation consisted of two regions of mostly Spanish-language hashtags, and a region of English-language, US and Trump related, hashtags which were used in 2017. The (pink) US-related hashtags were used a year after the 2016 US election, as they may have been attempting to influence post-election narratives. These accounts were created in 2017 (pink) and immediately deployed, while their older (green) counterpart accounts continued to tweet.

A dense region made up of accounts which were created between 2014 and 2017 (blue, pink) tweeted about the Bolivarian National Armed Forces between 2015 and 2017. Further research should be conducted into the narratives embedded in this tweeting region.

Ecuador

Total number of tweets: 700,240 (31% contained hashtags)

Number of languages of tweets: 51 (top three were Spanish, English and Indonesian)

According to Twitter, the Ecuadorian tweets in this network were tied to the PAIS Alliance political party, and were “primarily engaged in spreading content about President Moreno’s administration”.

The first six years of the Ecuadorian disinformation operation were quiet, with few inauthentic accounts using hashtags in their tweets. Then, between 2016–2019, the network engaged in multiple bursts of activity around various groups of hashtags, including those with legal and sporting themes.

The bulk of activity in this network occurred in 2018, with accounts which were created that year (grey) suddenly tweeting unique hashtags and popular ones, such as #atencion (attention), #telediarioec (TV news Ecuador), and #siguemeytesigo (follow back).

Russia (Internet Research Agency)

Total number of tweets: 8,768,633 (28% contained hashtags)

Number of languages of tweets: 58 (top three were Russian, English, and German)

Tweets from the Russian IRA were among the first to be released by Twitter in October, 2018. At this time, investigations into Russian meddling in the 2016 US election were underway, and my paper on this dataset was published eight months later.

The IRA information operation consisted of multiple languages, each taking up specific regions in the network. First, inauthentic accounts tweeted in Russian, with a spike in activity in July, 2014 when passenger airplane MH17 was shot down over Ukraine by a Russian missile. During this time, a Russian-language hashtag which translates to “Kyiv shot down the Boeing” was popular.

In 2015, activities shifted towards English language tweeting on the left side of the network. Regions of accounts using different hashtags such as #blacklivesmatter and #maga are active in the network visualization into the time of the US election in November, 2016. The IRA also tweeted in German, Italian, and Arabic between 2015 and 2017. Some inauthentic accounts also tweeted in multiple languages. This can be observed by the bridging blue and green lines between the Russian and English regions in the center of the network visualization.

Interestingly, they key region of specific US election-related hashtag use was executed by older (green) accounts which were created in 2013, but deployed between 2015 and 2017. This pattern of deploying ‘sleeper’ accounts was unique to the IRA and the Egyptian & UAE datasets analyzed in this article.

Egypt & UAE

Total number of tweets: 214,898 (82% contained hashtags)

Number of languages of tweets: 33 (top two were Arabic and English)

Twitter found evidence that this dataset was connected to a private technology company which operates in Egypt & the UAE. They stated that the inauthentic accounts were “ primarily targeting Qatar, and other countries such as Iran”.

Only a handful of inauthentic accounts tweeted hashtags before 2016. Activity increased between 2016 and 2019, and then a stepped increase in hashtag use was observed in 2019 and utilized accounts which were created in different years in the stepped bursts. Some were were accounts that were created before 2013 (green), and into 2019 (black).

The attribution of these inauthentic accounts to a private technology company leads to the question of whether the older (green) accounts were purchased from another party at some point in time. The key difference between these suddenly active older (green) ‘sleeper’ accounts in 2019 to the ‘sleeper’ accounts in the Russian IRA dataset is that the latter were active and building their presence through hashtag use before being deployed in a hashtag burst, making them appear less likely to have been purchased from elsewhere.

Iran

Total number of tweets: 4,447,056 (46% contained hashtags)

Number of languages of tweets: Undisclosed by Twitter (main alphabets used in hashtags were Arabic and Latin)

The dataset used in the above network visualization was released by Twitter in October 2018 and was found to have “potentially originated in Iran”.

The disinformation operation was gaining momentum across the network until it’s peak in 2017, when all regions of the network were highly active with accounts which were created across the last decade. These regions included ones with popular hashtags ranging from as innocent as #nature and #art, to ones as political as #deleteisrael and #freepalestine.

It is interesting to note that in this disinformation operation, the oldest (green) accounts were tweeting hashtags in multiple regions in the network, possibly tweeting in different languages. This tactic was also present in the Russian IRA and Chinese information operations.

To the Future

It is worth noting the limitations of this study before looking forward to the future of this work. First, Twitter has not released information about their attribution methodology, and it cannot be guaranteed that any inauthentic account which has been attributed to one country has not been purchased from another. Second, the choice to visualize inauthentic account relationships with hashtags was a subjective one, and visualizing account relationships with other Twitter users produces similarly interesting results. Third, high-level representations can lead to assumptions, so I have done my best to translate my deeper conclusions into future research ideas. Finally, soaking in these network requires parallel examination of two static graphics and one moving one, which can be difficult to understand. I am curious about how these networks can be communicated better, and hope to start a PhD researching just that this Fall.

This project has built on my research with NATO Defense Strategic Communications and on my MSc Data Science dissertation at City, University of London, which was supervised by Professor Jason Dykes. If you’ve made it this far and have any questions or curiosities you’d like to explore, please direct message me on Twitter. I’m always looking for new opportunities to use temporal network analysis to understand disinformation, and human interaction. I believe that temporal network analysis has the power to unlock the nuanced intricacies of complex interactions in a way that no other individual method can, and the bird’s eye view it affords deserves further effort to make it understandable to others.

Thank you!