Why have many suspicious tweets been going viral lately? A data approach.

mersen
7 min readJul 16, 2022

--

Recently, I embarked on a personal project to investigate the proliferation of questionable tweets from suspicious accounts in the Persian community. These tweets often go viral and are frequently liked and retweeted, despite being fake or sourced from anonymous accounts. As part of the project, I collected and analyzed data to better understand this phenomenon. To my surprise, upon sharing the results of my research on Twitter, the thread garnered significant attention and engagement from a diverse audience including members of the public, journalists, and activists.

It appears that the community found the research project and its results to be of interest, as indicated by the high level of engagement and response

I used Twitter API, Python, Excel and an online graph generator.

How it started?

There has been ongoing concern about the presence of government-affiliated accounts on social media platforms in Iran that spread propaganda and gather information. In the past, tweets that gained widespread popularity in the Farsi-speaking community were typically informative, well-thought-out, and sometimes critical of the government. They were often produced by citizen journalists. However, this trend has changed in recent years, with some accounts with large followings openly promoting the government or certain organizations. It appears that significant resources have been invested in Twitter, which remains one of the few free media outlets in Iran.

Gathering data

I recently conducted a personal research project in which I collected and analyzed data from 100 tweets, 50 from suspicious accounts and 50 from seemingly safe accounts and extracted 1000 likes from each tweet with Python and Twitter API. So, I have listed a total of 100,000 accounts that have liked these tweets.
It is important to note that this research should be considered a personal curiosity rather than a scientific study. In identifying suspicious accounts, I relied on my own judgment and paid attention to both the content of the tweets and the accounts themselves. For instance, a tweet that simply asks users to post a photo of their hand could be seen as a red flag, as it may be gathering fingerprints for nefarious purposes.

Result

In the following graphs, Yellow is for “Suspicious accounts” and white is for “Non-suspicious account”.

When did the hundred thousand accounts that liked these tweets join Twitter?

Upon examining the join dates of the individuals who liked the hundred tweets in my research, I observed that the majority of these users joined Twitter in 2017 or later. The platform saw an increase in popularity in 2020, but it appears that the dynamics have shifted significantly in 2021.

Same question in percentage…

Upon analyzing the data, it is evident that the preferences and motivations of the accounts that liked the tweets have evolved over time, reaching a peak in the months following 2022. It is likely that these trends will continue to develop in the future.

What percentage of these tweets had some kind of sexual content?

Based on the information provided, it appears that approximately one third of the tweets from suspicious accounts in the research contained sexual or NSFW content. This represents a significant difference compared to the tweets from seemingly safe accounts.

What percentage of tweets had photos?

The use of photos in tweets from suspicious accounts was similar to that of tweets from seemingly safe accounts. This was surprising to me, as I had expected that suspicious accounts would make more use of photos in order to attract attention and increase their visibility. However, the data showed that this was not the case.

Joining year of accounts that had viral tweets

Upon examining the join dates of the individuals who posted the viral tweets in my research, I found that while the join dates of those associated with seemingly safe accounts were relatively dispersed, the join dates of those associated with suspicious accounts were more concentrated in the past year and a half. This suggests that there has been a recent increase in the activity of suspicious accounts on the platform.

The number of tweets of these accounts in a recent week

Furthermore, upon analyzing the activity of these accounts over the past week, I found that the average number of tweets from suspicious accounts was approximately five times higher than that of a typical account. This is noteworthy, as it indicates that these suspicious accounts are significantly more active than the average user on the platform. It is important to note that this calculation includes both original tweets and replies.

One account, for example, posted 2800 tweets in a single week, which I found to be highly unusual. Upon further investigation, I discovered that this account had tweeted 12 times in the last 10 minutes alone.

The average number of followers and followings

In addition to analyzing the tweets and activity levels of the accounts in question, I also examined the number of followers and followings of 100 accounts. I found that while the number of followers of suspicious accounts was approximately 1.5 times higher than that of seemingly safe accounts, the number of accounts followed by suspicious accounts was 14 times higher. This is not surprising, as accounts with a large number of followers typically have a higher probability of going viral. However, the significantly higher number of followings among suspicious accounts suggests that they may be more proactive in seeking out and following other accounts, potentially as part of a strategy to amplify their content.

I talked to one the suspicious accounts!

In order to gain a deeper understanding of the motivations and strategies of suspicious accounts on Twitter, I reached out to one of the most followed suspicious accounts and asked if they would be willing to answer some questions. They agreed and shared some insights with me. They mentioned that they, along with many of their friends, have only been on Twitter for the past year or two, and that they feel pressure to follow back everyone who follows them in order to be a part of the Twitter culture. They also mentioned that there are secret advertising companies that contract with accounts to distribute ads based on their number of followers and impressions. These accounts are expected to maintain a high level of activity in order to keep their impressions high and secure more advertising opportunities. The individual I spoke with also mentioned that they personally earn money from their personal account and through administering other accounts, and therefore spend a significant amount of time online and tweeting.

Conclusion

I cannot confirm or refute the information provided by the individual I spoke with, as it is just one perspective from a specific group. It is possible that different accounts have different goals and motivations. However, it is clear that many users of Twitter in the Farsi-speaking community have noticed a change in the atmosphere on the platform. Regardless of whether accounts are deemed safe or suspicious, and regardless of their age, background, or purpose (whether it be advertising or gathering information), it is evident that the content on Persian Twitter has evolved over the past few years and will likely continue to change in the future. It is worth noting that this shift may not necessarily be for the better, in my opinion.

This graph illustrates the changing demographics of Twitter accounts over time. In 2018, the dominant group was white, but by 2020 this had become more evenly distributed. In 2021, the white group was clearly in the minority, and this trend has continued into 2022. It is worth noting that the data being analyzed here only reflects the first wave of 2021 accounts, and a second wave of accounts from 2022 is on the horizon. This suggests that the changes we are currently seeing on the platform may be just the beginning of a larger shift.

Codes

Collect tweets:

The tweets are manually selected. With a simple code in Twitter search, tweets with more than three thousand likes can be found.

lang:fa min_faves:3000

So, I made two lists. One for suspicious accounts and one for regular accounts. Each list contains 50 tweets.

Collect likers of each tweet to CSV (1)

Then, I used the Tweepy library to collect 1000 likes of each tweet. I was getting a lot of errors. So I had to collect data one by one and save it to separate CSVs. So now I have 100 CSVs for 100 tweets.

Fetch CSVs (2)

Then, I had to fetch 50 CSVs of each list together so I could clean and analyze data.

Get information for each user (3)

Get number of tweets on last week (4)

Analyze

You can find some of the codes i used for analyzing data here: Colab page

P.S. You can find codes in my Github.

--

--