Twitter Bots, A Lot More Nuanced Than You Think

Using Benford’s Law to Identify Malicious Accounts on Twitter

Published in

CodeX

9 min readOct 10, 2022

Social media has changed how people interact with each other, both for the better and the worse. With social media, it’s very easy to catch up with an old friend who lives on the other side of the planet. It also enables more people to get their voices heard, not just politicians and celebrities. However, it has some disadvantages as well. Some people have the perception that their interactions on social media don’t matter in real life, which is one of the reasons for the proliferation of cyberbullying. Furthermore, bot accounts are being created all the time. These bot accounts can be malicious and might cause harm in various ways, such as spreading fake news and misinformation.

I am not the only one who thinks that bot accounts can be malicious. After announcing that he will acquire Twitter, Elon Musk is having doubts about the acquisition due to the sheer amount of bot accounts on Twitter. Although it’s likely that he has other reasons to pull out from the acquisition other than the bot issues (with bot researchers not supporting Musk’s argument), it’s still undisputed that Twitter bots actually exist.

Based on this discourse, I attempted to develop a program to determine if a Twitter account is a bot. The inspiration for this project came when I watched a Netflix documentary called Connected. In the episode about numbers, Science journalist Latif Nasser talked about a universal law called Benford’s Law. If something occurs naturally, normally it would follow Benford’s Law. This includes tax statements, elections, rainfall rate, city population, etc. Therefore, it’s possible to detect fraud or anomalous behavior if something doesn’t follow this law e.g. fraudulent tax statements, rigged elections, and social media bot accounts. Benford’s Law works mainly by analyzing the first digit of the numbers (e.g. 100 starts with a 1; 82309 starts with an 8) and then visualizing the distribution. While people might believe that the first digits are uniformly distributed, this is not the case according to Benford’s Law. The law states that first digits are log-normally distributed; numbers that start with one occur 30.1% of the time, numbers that start with two occur 17.6% of the time, and so on like the chart below that Rob Gonsalvez posted.

*First Digit Frequency According to Benford’s Law (Source:* *Towards Data Science*)

In the ‘Connected’ episode, Latif met with Professor Jen Golbeck of the University of Maryland, where she talked about using Benford’s Law on Twitter and found accounts that are not following the law. She then claimed that those accounts were bot accounts with potentially malicious intent. By using the followers’ list of an account and then fetching the first digits of the follower’s followers, Golbeck was able to analyze the data using Benford’s law. She discovered a couple of dozen suspicious accounts that she suspected to be Russian bots standing by for future use.

Professor Jen Golbeck and Latif Nasser (Source: Netflix via IndieWire)

After watching the documentary, I started digging deeper by doing some research on some papers related to Benford’s law, including Professor Golbeck’s paper on her experiment. She wrote that she used the Chi-square test on a large set of Twitter accounts and then ignored the statistical significance level and just focused on the p-value. The higher the p-value, the more the dataset follows Benford’s law, meaning that the account is behaving naturally, and the lower the value, the higher chance it is that the account is suspicious. She did Benford’s Law analysis by going through an account’s followers list and then fetching the first digits of the follower’s followers.

Having had a better understanding, I continued to make sure that I had the required tools. In this case, I used Python to write the code, and since the data was pulled from Twitter, I needed to use the Twitter API. Since Python has a library for using Twitter API such as Tweepy, which I had prior experience with, I thought this is a good start to fetch the data for Exploratory Data Analysis. I understood that Twitter puts the followers' count on something they call public_metrics. However, Tweepy let me down by not being able to pull Twitter’s public_metrics despite having the parameter to do so.

Tweepy not showing *public_metrics (Image by author)*

Fortunately, I found an alternative library that allows me to fetch public_metrics, a library called Twarc. While it has a different syntax to Tweepy, it could do all the things I needed for this project and it didn’t take too long for me to adjust to Twarc.

from twarc.client2 import Twarc2bearer_token='<can be retrieved from Twitter developer portal>'
t = Twarc2(bearer_token=bearer_token)

Twarc easily fetches user information including public_metrics (Image by author)

Finally, I found a Python library for Benford’s Law that saved me from writing one from scratch, with a very self-explanatory name: benfordslaw.

benfordslaw Python library, showing how random data (uniformly distributed) doesn’t follow Benford’s Law (log-normally distributed) (Image by author)

With Twarc and benfordslaw libraries, I decided to fetch some test data from Twitter and mapping it with Benford’s Law.

fol2 = list(t.followers(get_userid('abrahamspartner')))
df_partner = pd.json_normalize(fol2[0]['data'])

Fetching data from my partner’s Twitter account (Image by author)

My partner’s Twitter account reflecting Benford’s Law (Image by author)

Then I did the same thing for my own profile and for a bot profile I found on Twitter. Here’s how they look.

Distribution of my profile and a bot profile (Image by author)

I was surprised to see how my account reflects Benford’s Law less than an actual bot account. For a second, I thought “I am a bot but the bot is not a bot? Am I more bot than a bot?!” Furthermore, benfordslaw library automatically adds Anomaly detected! text if the p-value is lower than the predetermined alpha, which makes it even more bizarre. But then I came back to Professor Golbeck’s method that the alpha (statistically significant or not) doesn’t matter, and only the p-value does. Still, this doesn’t explain why the bot's p-value is higher than mine.

Upon further thinking and analyzing, I realized that the way I’m using Twitter is not standard. I mostly post things on Twitter when I post images on Instagram. This probably affected how people interact with my Twitter profile, and who would follow my profile.

On the other hand, the bot account that I chose was very open to being a bot, and what it does is sharing relevant artworks from a particular museum. This has entertainment value, and it makes sense why people would follow this bot account.

Entertainment bot account (Screenshot from Twitter)

This brings a lot more nuance to the quest of identifying bots on Twitter, since entertainment bots are regarded as being natural by Benford’s Law. However, we could niche the target to suspicious accounts rather than all bots. To do this, we need to get a lot more data from Twitter. I decided to go with fetching my account and all my followers, plus the entertainment bot account and all its followers. I wrote this function to automatize the process.

And then calling the function to a dataframe containing the Twitter followers' information.

p_val = []
percentage_emp = []for username in tqdm(df_bot_loop['id'].values):
    pv, perc = get_twitter_benford_loop(username)
    p_val.append(pv)
    percentage_emp.append(perc)
    
df_bot_benford = df_bot_loop.copy()
df_bot_benford['p_value'] = p_val
df_bot_benford['percentage_emp'] = percentage_emp

However, it turns out that Twitter API has a limitation on executing large amounts of requests. After a few requests, it will force-sleep for about 15 minutes, and it did so for so many times. This is not related to the allocated quota of your project, but rather a limitation imposed by the Twitter API. This caused the runtime to increase significantly from an anticipated few hours to more than a week.

Force-sleep by Twitter API (Image by author)

After waiting patiently, I finally had the data from more than 4000 accounts. I checked the user with the lowest p-value (5.7742e-16) and looked at their profile. Upon first glance, the account seems to be legitimate.

Twitter account with the lowest p-value (Screenshot from Twitter)

And even though it has a small p-value, the distribution still somewhat resembles Benford’s Law.

Distribution of Twitter account with the lowest p-value (Image by author)

More questions came to mind after getting this result, “How can a seemingly legit account have such a low p-value? Is this a dead end? How low is actually suspicious?” These questions triggered me to research further, to see if I can get some insights from Professor Golbeck’s experiment.

And then I saw the light. I found Professor Golbeck’s dataset on GitHub. I downloaded the dataset and started with Benford’s Law analysis. This didn’t take long since I didn’t need to fetch the data from Twitter.

p_val = []
percentage_emp = []for user_id in tqdm(df_golbeck['UserID']):
    bl = benfordslaw(verbose=1)
    try:
        _df = pd.read_csv(f'jgolbeck/anonymizedTwitter/{user_id}', delimiter='\t')
        X = _df['Follower_Count'].values
        bl.fit(X)
        p_val.append(bl.results['P'])
        percentage_emp.append(bl.results['percentage_emp'])    
    except FileNotFoundError:
        p_val.append(0)
        percentage_emp.append(0)df_golbeck['p_value'] = p_val
df_golbeck['percentage_emp'] = percentage_emp

And then I sorted the dataframe to show the top 5 smallest p-values from the dataset.

Result of Benford’s Law analysis on Professor Golbeck’s dataset (Image by author)

This surprised me a lot. Turns out my lowest p-value of 5.7742e-16 (16 zeros behind the comma) is nothing compared to these ones. Look at the lowest one, it’s e-303! 303 zeros behind a comma! That is 10²⁸⁷ times lower than the lowest one I had. Here’s how the lowest three distributions look when visualized. They look nothing like Benford’s Law.

Distribution of the lowest three from Professor Golbeck’s dataset (Image by author)

Furthermore, in her paper, Professor Golbeck provided some screenshots of suspicious activities on some of these accounts. Since the dataset is anonymized, it’s not possible to pinpoint which tweets belong to which account, but it is explained that these belong to some of the accounts with the lowest p-values.

Suspicious activities from potentially malicious accounts (Screenshots from Professor Golbeck’s paper on ResearchGate)

Compared to retweeting relevant artworks or other entertaining activities on Twitter, these posts seem very random and spammy. In hindsight, I see a big difference between entertainment bots and potentially malicious bots. The entertainment bots regarded as natural occurrences by Benford’s Law (having high p-values) make sense because they are kind of an integral part of the Twitter community, where people interact and enjoy them; in contrast with the potentially malicious accounts.

I set out on this project with a goal in mind: to detect bots on Twitter. However, I encountered challenges and insights that changed my perspective. Twitter bots are a lot more nuanced than I thought.

While this analysis is already high in complexity, I would argue that doing further analyses would uncover even more nuance and possibly detect more malicious accounts with a higher degree of certainty. An example would be qualitative analyses of the tweets, followings, and followers.

I hope you enjoyed reading this article and learned something from it. Have a great one!

Twitter Bots, A Lot More Nuanced Than You Think

Using Benford’s Law to Identify Malicious Accounts on Twitter

Written by Abraham Setiawan