A tool to identify bots, fake accounts, and false information on Twitter

Arpit Maheshwari
5 min readJan 19, 2020

Recently, there was a lot of unrest on Indian Twitter (and in India as well) regarding the Citizenship Amendment Act (CAA) and the National Register of Citizens (NRC). Not just that, a series of events took place in multiple Universities across the country which led to nationwide protests amongst the students.

There were several tweets being tweeted regarding all of these incidents and from what it looks, a lot of false information was being spread coming from some suspicious accounts. Either the photos or the videos were morphed, or the wrong message was being conveyed, slogans at these protests were quite extreme, and at times a few popular twitter handles (with a verified blue tick) were also found spreading wrong information.

I started going through the tweets for these trending hashtags and I could see all kinds of tweets on Twitter including hate speech, violent and abusive messages. Upon seeing the profile page of a few Twitter handles, I could see that such Twitter accounts were quite recently created (on average 4 days ago). There were also a few provoking tweets that were tweeted exactly as is by numerous Twitter accounts.

Capturing tweets to analyze

That’s when I started capturing all the tweets using Twitter APIs for a particular hashtag that was trending at the moment. Started with capturing a few trending hashtags such as #JNUViolence, #JNUattack and then later on #CAA_NRC_Protest, #CAA_NRC, #CAAProtest as well.

Upon generating few reports, it was not very surprising to see that a lot of accounts were created just a few days ago, in fact, the total number of users/accounts that took part in this whole process were created just in the month of December 2019 and January 2020 (until 12th Jan). The number was so big (relatively) that so many accounts have never been created in the past few years at any given point in time. It was evident that a lot of fake accounts were created to spread false information and invoke unrest.

The maximum number of accounts were created in the year 2019.

In the image below, as you can see, the most number of accounts were created in the month of December 2019 and January 2020 (until 12th Jan), i.e. just a few days before the incident or while the incident was already taking place.

The maximum number of accounts were created in the month of Dec’19 and Jan’20

Figuring out suspicious accounts

Now, time to look for accounts that were recently created but have been active a lot since and got a lot of retweets on their tweets. Basically, the aim was to point out such accounts that were active a lot but had no or minimal interactions from others. Again! it was not surprising to see the number of suspicious accounts that made it to this list, upon visiting their Twitter profiles (for a few of these accounts), it was clear that these accounts were created only for these events (CAA and NRC) because these accounts had no tweets other than these topics. Some of these accounts also had no profile pictures, the usernames were quite random having a lot of numbers in them and just a few alphabets. For a few, the alphabets were also completely random, gibberish, and made no sense. Anybody could clearly figure out that these accounts were created just for a single purpose.

Twitter handles that had unusual activity

Sentiments behind these tweets

I also tried to perform Latent Dirichlet Allocation (LDA) on the tweets and sentiment analysis as well. LDA helps in figuring out the words that were being used the most for a given set of textual messages and sentiment analysis helps us understand the nature/emotions of a given text. Hence, using both on these tweets to see the extremity of these tweets.

Sentiments behind the captured tweets
Words that have been used the most in these tweets

Original tweets vs. Retweets

Then, to figure out the tweets that were doing most number of rounds, either they have been tweeted by many people or retweeted by a lot of people. This helps in figuring out the tweets that were shared the most and it also helps in understanding who started the chain. Now segregate original tweets and retweets to understand the ratio between these two to see how big is the difference between the two numbers.

Original vs. Retweets

Below are the tweets that have been tweeted the most.

You can view this entire report at the BoffinBot report on #JNUVIOLENCE, #JNUATTACK. Moreover, you can do your own analysis at BoffinBot.

Also, view a similar BoffinBot report on #CAA_NRC_Protest, #CAA_NRC, #CAAProtest.

NOTE:- The tool only captures ~5,000 tweets per keyword/hashtag because Twitter has restrictions on its API, therefore, we intend to use this API wisely. These reports are generated on a set of tweets close to 5,000 in total.

Also, the data in the graphical representations might change as we stop capturing new tweets after a certain threshold. These representations are purely based on the limited set of data captured.

--

--

Arpit Maheshwari

Software Development Engineer. Worked for several startups & co-founded a startup. Eager to learn. Optimistic. Like watching movies & working on side-projects.