Welcome to the Party: A Data Analysis of Chinese Information Operations 🇨🇳
Twitter recently released a 1.6 GB archive of tweets that it determined had been spread by the government of the People’s Republic of China (PRC) as part of an information operations (IO) campaign to attack and discredit ongoing protests in Hong Kong. IFTF’s Digital Intelligence Lab analyzed the data to examine key influencers, common messaging themes, strategies, and oddities in the data. This analysis was also featured in the New York Times.
Main Takeaways — the TLDR:
- We found 44 accounts with a high likelihood of belonging to the Chinese disinformation network that was removed. These accounts had remained active on Twitter and Facebook after official takedowns: Accounts posting identical tweets simultaneously from custom bot software used by Chinese state-actors still remained live on Twitter after the official takedown. Many of these tweets were word-for-word matches with attributed tweets in the disinformation archive. Thirty -one Twitter accounts were found in all. In one case, a user was still tweeting about the Hong Kong protests. On Facebook, we found thirteen suspicious accounts that posted criticisms of Chinese dissidents that were also word-for-word matches with tweets from Twitter’s attributed dataset*.
- Nearly all accounts we detected were removed from the platforms after they were notified of our findings: 17/17 Twitter accounts were promptly suspended and 12/13 Facebook accounts were removed after these companies were notified of the research. All removed posts are archived here.
- China is emulating divisive disinformation tactics seen in other disinformation campaigns from Russia and Iran: The content in this archive shares noteworthy similarities to information operations carried out by Russia and Iran in past years. The usage of high-volume bot accounts, repurposing of spam infrastructure for political messaging, amplification of divisive topics, and narrative content all bear resemblance to previous known campaigns. Three favorite narratives of the Kremlin in particular are highly present in the data: impugning human rights as a Western power ploy, discrediting protesters as paid agents of the CIA or Western NGOs, and propagating the narrative that the pro-democracy protests known as the color revolutions were illegitimate protests for the same reasons. General suspicion of foreigners and xenophobic content is frequent in these tweets.
- Several posts lend insight into China’s intentions in Taiwan and the United States: For the first time, we observed that China is not only willing to target foreign countries’ domestic affairs with IO, but that it already has in the cases of Taiwan and the US. While the main focus of political tweets in this dataset is Hong Kong, Chinese and English-language tweets also target politics in both Taiwan and the US. In Taiwan, China promotes content that supports the unification of Taiwan with the People’s Republic of China. In the US, one of China’s fake accounts, @LibertyLionNews, spread politically divisive messages attacking antifa and fact-checking organizations during protests in Portland last June.
- Significant effort was made to localize content for the Hong Kong protests: The Chinese notably made use of Cantonese and Hong Kong’s local form of writing (traditional characters) in this messaging campaign. While spam and some political messaging occurred in simplified Chinese — the form of written Chinese most commonly used in mainland China outside of Hong Kong — most political messaging directed at Hong Kong was carried out in local language conventions. Twitter also does not appear to have internal language support for Cantonese, as both Mandarin and Cantonese tweets are classified with the same language code — “zh”.
- China repurposed spam infrastructure, including bots and fake accounts, for political messaging when the campaign took off: This dataset contained some surprises — notably a great linguistic diversity and high citation rate of URLs promoting spam in the form of digital marketing and Chinese dating/escort services. The most frequent hashtags and phrases in both English and Chinese reveal that many of these accounts were mostly focused on promoting spam and began to spread political disinformation after the protests in Hong Kong had intensified.
- Guo Wengui is China’s favorite topic on Twitter: The eccentric, billionaire dissident-in-exile Guo Wengui has been a bugbear of China’s for years. Guo’s name (郭文贵) was one of the most frequent phrases in Chinese tweets¹ in the dataset, with mentions in over 40,000 posts, many of which were identical, simultaneous posts from batches of centrally controlled accounts. Other Chinese dissidents were also mentioned in Chinese tweets — Yang Jianli was also mentioned in over 2,000 posts. Liu Xiaobo, Yuan Hongbing, and Tang Baiqiao also figured among the top 5 most mentioned Chinese dissidents². In tweets about Hong Kong, Joshua Wong was mentioned more often than other Demosisto leaders, such as Nathan Law and Agnes Chow.
A Deeper Dive: Looking into the Data
We’ll explore the details of these themes diving into data and content below.
Retweet Network Visualization
Content Themes: Discrediting Protests as Operations of the CIA, Western NGOs and the “American Empire”
One of the most common themes in this dataset is that Hong Kong’s current protests have been staged and funded by the CIA and Western NGOs and activist groups. The US in particular is seen as a chief funder and manipulator behind the scenes. Much of the disinformation spread in this regard takes on a xenophobic flare, using derogatory terms such as Western dog (洋犬) or Western ghost (洋鬼子) to describe foreigners⁴. Several of these tweets came from @HKPoliticalNew, one of the most influential accounts in the dataset.
RT @HKpoliticalnew: 圖中外國人Larry於香港多次暴亂中提供資助及訓練和幕後指導和策劃行動，呢就系黃屍一直隱瞞及唔願提及嘅外國勢力。#顏色革命 #外國勢力 #Larry #暴動策劃者
RT @ HKpoliticalnew: The foreigner in the picture, Larry, has provided funding and training in several Hong Kong riots, as well as plotting and directing the moves from behind the scenes. These are yellow corpses⁵, always concealing (their intent) and never willing to mention foreign powers. #ColorRevolutions #ForeignPowers #Larry #RiotPlotter
Brian is a retired professor at the international school. In 1990, after spending time in the mainland getting students to skip school and protest, he went to study in America. Afterwards, in many countries he taught “human rights” courses and disseminated color revolution thought, and brought students to a pro-Tibetan independence village in India. He’s lived in Hong Kong for 11 years now, teaching skills like petitions and protests, “brainwashing” students to be anti-Chinese. Now this retired teacher has stepped out from behind the scenes and taken center stage, playing the role of war advisor for the opposition, attacking the rule of law. #WesternDog
記者調查發現：反對派背後有一股強大的勢力在組織操控！反修例團體「香港人權監察」長期收美國國家民主基金會（NED）撥款共1500萬港元。NED創辦人Allen Weinstein直認NED大部分工作與美國中央情報局的秘密工作無異，只是以「人權」等公開模式進行，掩飾背後的隱蔽工作。#香港 #NED資助 #顏色革命
Investigative reporters found out: a large force is controlling the opposition. The Hong Kong Human Rights Monitor, a group opposing the amendment, has received funding totaling $15 million HKD ($1.9 million USD) from America’s National Endowment for Democracy (NED). NED’s founder Allen Weinsten admitted that the majority of NED’s work is no different from the secret work of the CIA, it simply uses “human rights” as a framework to do it. #HongKong #NEDFunding #ColorRevolution
People did not do this (protest) purely out of their hearts, it’s because there are interests involved.Signs of foreign spies are piling up! #ColorRevolution #HongKong.
外國勢力策反香港政府嘅12個步驟【2】The 12 steps taken by foreign forces to overthrow the Hong Kong government【2】#香港 #顏色革命 #外國勢力 #香港時政直擊 https://t.co/U1e4aeuxpm
The allegation that the protests are paid or part of a centralized “color revolution” scheme is a de-legitimizing tactic that has been frequent in Russian disinformation campaigns. In his new book, This is Not Propaganda: Adventures in the War against Reality, expert Peter Pomerantsev describes the goal of these narratives:
“When the Kremlin crawls inside protest movements online, the very notion of genuine protest starts to be eroded, making it easier for the Kremlin to argue that all protests everywhere are just covert foreign influence operations. This reinforces the larger narrative the Kremlin (and Iranian and Chinese) media are trying to reinforce, that movements such as the color revolutions and the Arab Spring are not genuine but US-engineered regime-change plots, that there is no such thing as truly bottom-up, people-powered protest.”
Other tweets in the set more directly accuse the “American empire” (美帝) or the CIA itself of fomenting the protests.
RT @HKpoliticalnew2: 今天美帝指令要實行民主的地區是香港,明日的香港便如….. #香港 #顏色革命
RT @HKpoliticalnew2: Today the American empire orders that Hong Kong become a democratic territory, and tomorrow it’s like this…. #HongKong #ColorRevolution
(This Tweet presumably was accompanied by a photo before being removed. Non-URL media is not currently publicly available. )
Hashtags re-framing the protests as a question of support for the Hong Kong police were also frequent. #撐警行動 (#SupportPoliceAction) was the 25th most common hashtag in tweets occurring since the beginning of the protests in March 2019. #香港警察 (#HongKongPolice) and #警察 (#police) were also figured among the most frequent hashtags in this set.
Chinese Communist Bots — Dawn of the CCB
The usage of automated accounts to spread disinformation is a salient feature of this dataset. Several patterns in this data are tell-tale signs of bot activity — the usage of custom Tweet clients, identical original content posted simultaneously by multiple accounts, and high posting volume of several users in the set. While this is a well-documented phenomenon, the particulars of bot usage in this archive highlight the porous border between commercial spambots and political disinformation bots. This problem is particularly grave given the ease of buying fake accounts online, even on Facebook itself.
One account, @ranvijaysowle, created in April 2016, waited one year before making a post, and then averaged 274 posts per day until it was suspended. This account posted over 177,000 tweets in two years. Nine users in the set waited over 10 years before posting their first tweet. As DFRLab has noted, there is also a high linguistic diversity in the set, and several dates on which batch creation of fake accounts took place. Fourteen accounts in this set averaged over 100 tweets per day.
Escort services and sex-oriented accounts also feature prominently among these bots. @thesexxxtweets, @gonewildvids, @adultflixx are accounts that promote porn and escort services. Another user, @ronetaper, promoted the Ferris Wheel Hookup Platform on WeChat (摩天轮约炮平台), which is referred to as “China’s first hookup platform” and advertised as being available in China, Europe, and California. Several accounts on Twitter are still actively promoting this platform. These users do not appear to have any tweets mentioning Hong Kong or politics.
FaWave — a Custom Client for Automating Disinformation Attacking Dissidents
Custom clients — third-party software that post tweets for a user through Twitter’s API — were frequent in this dataset. While custom clients can be everyday tools popular with non-tech savvy users (such as TweetDeck), they can also be used for implementing bots or malware on Twitter. In this archive, several custom clients were used to help amplify spam or disinformation.
In particular, a Google Chrome PlugIn called “FaWave” was frequently used to send messages out from multiple accounts simultaneously. Over half of the 10,552 tweets sent from FaWave mention at least one of five Chinese dissidents⁶.
Several tweets criticizing Guo Wengui, sent simultaneously from multiple bot accounts using this plugin, remained live on Twitter after the official takedown. What’s more, many of these tweets matched word-for-word with multiple posts in Twitter’s released and attributed dataset, which meant several accounts showing significant signs of belonging to the same Chinese disinformation network were still active on the platform after the official takedown.
One of these users, @shouyuliu had the same profile photo as attributed user @katiushaegorov2. In another pair (1, 2), two bots criticize Guo’s “profiteering personality” for allegedly selling out and cooperating with foreign militaries. One of these users, @nick98486375, was only active for 19 days in October 2017. During this time the account tweeted nearly exclusively about Guo and produced over 1,200 tweets. Another triplet of centrally controlled bot accounts (1,2,3) — all created within the same 70 second window on February 5th, 2018 — posted identical messages about a Chinese court’s ruling on Guo’s alleged crimes the exact same minute on October 11th last year. Yet another pair of bot accounts (1,2) attack Guo and predict his demise at the hands of President Trump.
#GuoWengui Guo Wengui is a backstabber. He’s suspicious of all alliances and completely untrustworthy. There’s no doubt that President Trump, who used to be a businessman as well, will get rid of Guo — it’s just a matter of time. The fact that Guo Wengui is in cahoots with Falun Gong and supports Falun Gong has completely destroyed his credibility and turned him into an international clown and joke. Guo Wengui’s final ending will be being swept out of America and standing trial at home.
After official takedowns, accounts likely linked to the same Chinese disinformation network also remained active on Facebook — the message above can be found in messages posted by several suspicious Facebook accounts. For instance, a user named 范龍飛 posts the first part of the message above three separate times in 2017 — on September 28, October 1 and October 12.
Several other accounts posted identical messages during the same time frame.
Another Facebook account used multiple word-for-word matches from tweets in Twitter’s attributed archive in a longer post attacking Guo.
The Daily Beast noted the presence of bot swarms attacking Guo on Twitter during this same time frame — October 2017. Twitter formally announced changes to the API to prevent posting similar content from multiple accounts in early 2018. Several pairs of accounts in the released dataset continued to post simultaneous messages from multiple bot accounts from the FaWave plugin after this date, such as @gwalcki4 and @mauricerowleyx. Twitter’s changes do seem to have reduced tweets from more than two profiles. Identical messages emanating from more than 10 accounts before the 2018 announcement occur multiple times in the data — with as many as 18 accounts posting simultaneous, identical messages on several occasions before early 2018.
While many of the tweets emanating from this client were political, not all were — this is, after all, the nation for which the term “cheerleading” propaganda was coined. For instance, nine users in this set used this custom client to promote a famous poem by Tang dynasty poet Li Bai about the nostalgic beauty of moonlight.
Messaging in Taiwan
While the focus of political messaging in this dataset was Hong Kong, some tweets in the set target foreign countries — specifically Taiwan and the United States. One Tweet from @lingmoms, a user claiming to be based in Las Vegas, Nevada, alleges that Taiwan has an unequal enforcement of freedom of speech.
Taiwan only has freedom of speech for those who support Taiwanese independence — what about freedom of speech for supporting unification?
This post also links to a debate of the same name on YouTube from Chung T’ien Television (CTI/中天電視), a Taiwanese TV channel. Simplified Chinese comments supporting the host’s pro-China stance dominate the comment sections of both the YouTube channel and Facebook page for the show.
This TV channel is part of the China Times Media Group (中時集團). In 2008, this group was acquired by a food company in Taiwan, the Wang Wang Group (旺旺集團), whose owner Tsai Eng-meng (蔡衍明) is an outspoken advocate for unification with the mainland. In Taiwan, concern has been growing about China’s influence on local media. Reuters recently revealed that Beijing is even purchasing positive coverage of the mainland in Taiwanese media. Protesters in Taiwan demonstrated in June against China’s influence on local media.
Other posts from @lingmoms link to additional content promoting the idea of unification — such as a speech from CTI TV host Huang Zhixian (黃智賢) supporting “returning Taiwan to our ancestors’ country” — or highlight America’s lack of commitment to Taiwan’s future. While all of these sources are local Taiwanese ones, China has shown a clear pattern in promoting content that idealizes unification of Taiwan and China.
Messaging in the United States
One account, @LibertyLionNews, also live-tweeted protests that ended in violence in Portland, Oregon on June 30th. This account described itself as “Conservative News from the USA and Abroad. #Catholic Defender of the Constitution of the United States. #Qanon #MAGA #BUILDTHEWALL #TRUMP #2A #1A ❌❌❌” and had garnered over 180,000 followers at the time of its suspension. Messages alleging that mainstream journalists supported antifa and decrying fact-checking organizations as biased featured among these tweets.
Hmmm I wonder what the “Fact Checker” Snopes has to say about the #Antifa assault on Journalist @MrAndyNgo All of the “Fact Checkers” are of a far left wing bias. https://t.co/t917wRoPSr
#Antifa are the dregs of society. A loose coalition of outcasts embodying everything that is wrong with the world today. Disgusting people. https://t.co/pOLgtf32pI
China has made its debut as a confirmed information operations actor with this campaign targeting Hong Kong’s 2019 protests. Twitter’s archive of data lent several key insights about the operation. It showed that the country made extensive use of bots, especially repurposed commercial spam bots and custom tweet clients, to disseminate disinformation and smear dissidents. China also put some amount of care into well-crafted messaging in such campaigns — using local linguistic conventions and linking to local sources that promote viewpoints advantageous to the PRC’s goals was a deliberate strategy. This data also showed that China is not only willing to message on foreign policy or foreign countries’ domestic affairs, but that it already has in the cases of Taiwan and the US.
Most importantly, this public dataset illustrated why more public data releases of known state-sponsored disinformation operations are so important. When external experts can work with this data, it maximizes the public’s ability to stay informed about bad actors online — and to catch what has been missed within the platforms. Looking ahead, with high-stakes elections in both Taiwan and the United States in 2020, we can use insights gleaned from this research to inform disinformation monitoring efforts in both countries and around the world.
Footnotes and Methodology:
[*] A Note on Open-Source Attribution: Open-source attribution refers to attributing cyber operations such as disinformation or hacking to a specific actor using only publicly available data. In the case of this research, this notably excludes many server-side signals that are only available to social media platforms themselves— IP addresses, granular account activity statistics, direct messages, and associated emails, just to name a few. Academics, researchers, and even governments do not, in normal conditions, have access to these signals. To an extent, this is as it should be, as the trade-offs here are user privacy and data safety.
Our team was conservative with its confidence assessments, not considering accounts to be associated with the PRC until multiple overlapping signs occurred in each case.
On Twitter, these signs included simultaneous bot tweets from bot clients heavily used by the PRC (FaWave, Twittbot.net), original tweets that were word-for-word matches with tweets in Twitter’s own attributed PRC archive, tradecraft/context (attacking the same targets with the same narratives), timing (same messages spread during the same time frame), and appearance (same profile photo as attributed accounts). On Facebook, where less metadata is publicly available, these signs included word-for-word matches with attributed tweets from Twitter’s PRC archive, content, signs of automation, and timing.
While we are confident with our assessments and left lower-confidence accounts out of this report, it is ultimately only attribution teams at Twitter and Facebook who are able determine who is behind these accounts. Based on the data above, especially the overlap between our discovered accounts and signals provided in Twitter’s own dataset attributed to the PRC, our team finds it highly likely that the actor behind these accounts is the Chinese government or entities working with/for the Chinese government (privately contracted firms, individuals, etc.).
Critically, attribution, while important, may also be secondary in this case: what is clear is that, even after public takedowns, batches of inauthentic, automated and coordinated accounts remained live on Twitter and Facebook promoting disinformation narratives that the Chinese government itself had also promoted.
 This was ascertained by analyzing bigrams — or pairs of co-occurring words — in Chinese tweets. Following the convention of this type of natural language processing (NLP) analysis, our team removed stopwords from these tweets and extracted tokens using the StanfordNLP Chinese language package. Guo’s Chinese name (郭文贵) was the single most frequent bigram in all Chinese tweets. As a hashtag, it’s the 25th most frequent in the archive.
Our team compiled a list of 48 well-known Chinese dissidents and searched the archive for mentions.
 This statement is based on the nodes’ betweenness centrality, a metric that is used to measure influence within a messaging network. Removal of nodes with high betweenness centrality from a messaging network results in the greatest disruption of flow of information within that network. A node’s size within the Gephi retweet network above represents the node’s betweenness centrality — the greater the influence, the larger the node. For more information on betweenness centrality, see Nonnecke et al’s study on the influence of bots in spreading politically divisive messages about women’s reproductive health during the 2018 US midterms.
 While xenophobic tweets often mention alleged meddlers by a first name (Larry, Brian, Rebecca, etc.), they do not link to external sources, and photos have not been included in the public data provided by Twitter. It is therefore not possible to ascertain whether they are referencing real people.
 The term “yellow corpse” (黃屍) is meant to be an insulting pun for pro-democracy protesters, who are frequently referred to as Yellow Ribbons (黃絲), a similar sounding phrase. Pro-democracy protesters began wearing yellow ribbons in 2014 as part of the Umbrella Movement protests in Hong Kong.
 Those dissidents, in order from most mentions to fewest are Guo Wengui, Yang Jianli (杨建利), Yuan Hongbing (袁红冰), Tang Baiqiao (唐柏桥), Wang Dan (王丹), and Wu Gan (吴淦 — also known as 超级低俗屠夫).
 The tweetids for those tweets are — 1019880234772922368, 920245733827080193, 995452526857109504, 918021156166254592, 918016383862022144, 918016800696053760, 918020873327493121, 932526143139299328.