Become a bots hunter in 6 steps!

Valentin Guillot
AppCívico
Published in
8 min readApr 10, 2018

Like 3 billions of people, you probably are at least on one of many existing social networks. But do you know that us, humans, are not the only ones who are on it, do not worry, we do not think that aliens are using our Internet for now, however, a lot of bot do!

Twitter and popular social networks have a lot of bot accounts, Their goal are multiple, influence, politics, fake news… some have bad intentions, some not. Whatever, when you have doubt about a profile it is important to know how to identify one.

Of course it is not so easy to get the certitude about a bot profile, if it was, Twitter might had ban them already. But they still think that about 15% of Twitter users are bots.

However, we are all agree for say that it is totally possible to say that a profile looks more likely to a bot or to an human.

This is what PegaBot do by giving you a percentage of bot probability! How is working ?

Does only a tool like PegaBot can do it, or an human can as well? Who is better?

This is all we are going to see!

The first thing to do is obviously to open the Twitter profile you want to check! All the informations we use are public. For PegaBot, Twitter send us all the data, but an human have to make some clicks before find everything.

Did you already noticed all the features of a PegaBot results? One of them is “Usuário (User)”, let’s start by this one!

1 - Usuário (User)

A bot generate its profile randomly and each kind of bot have its own method for it. In general, we can say that the less information the profile have, the more the probability to be a bot is hight, but not only, let see if all the informations here seems logic or not.

First, does the profile is certificated by Twitter?

Look for the blue label next to the user name (on the profile block or on any tweet), and only here, if the profile have it somewhere else, it is a fake label! This label prove that a Twitter agent verified the authenticity of the account, it is not really probable that an expert agent give the certification to a bot.

Then you can have a look at the user name and the screen name, do they have any meaning? Or it is only some letters and digits that seems to be put there randomly? Does the name and screen name are similar?

If the profile put a too different name in the profile name and in the screen name without reason, this is suspicious, maybe these two was generated randomly independently?

It is important to also check the profile picture, does it have one? Is it a public photo that can easily be found on Internet? Is it a photo stolen from another profile? Or it is simply not matching with the profile name (photo of a woman but the profile have a male name for example).

This can be verified easily by any human, for a computer algorithm, it is much complicated. (Human: 1 — PegaBot: 0)

How about the age of the profile? We can think that the more old the profile is, the more hight the probability to be an human is! Indeed, if a profile was created less than three month ago, during an electoral period, this is really suspicious!

When you know the age of a profile and the number of tweets posted, you can make a fast calculation about how many tweets are posted by day, if this value seems improbable (100 tweets everyday constantly seems possible for an human?), this calculation can be decisive about the bot probability.

The last thing to check is the number of followers, following and favorites. Some bot just sent a lot of random following on people, and do not have this follow back. A normal human, not a public person, not an organization, have usually a ratio about 1 of followers/following.

To be more undetectable, some bot are now actually “half-bot”, that mean for example that an human created the profile before let the bot do its job. Sometimes it can also be launched on old hacked account. So everything we saw here, can be useful in a majority of case. But not enough, that is why we need to check others features. Let’s see the next one !

2 - Amizades (Friends)

Let see a bit closer about “friends” of the suspected profile!

The difference of this feature with the Usuário (User), is that this one can be done more easily with an algorithm software than by an human (Human: 1 — PegaBot: 1), indeed, for this we have to open the profile of each followers and people who the profile is following, that is very long. To make the process faster, PegaBot do it with “only” 200 followers and 200 following.

What we need to analyze here is the consistency of the suspected profile to follow these profile and to be followed. A bot start usually to follow some random profiles, so here we can see a lot of suspicious things!

Is it normal for a Portuguese profile to follow people who speak only Japanese, Russian or French? But different country can speak the same language, so we can also look for the location of each profile.

Same as for the language, it is suspicious to have a profile who friends are mostly from others timezone. The more different timezones/languages the friends distribution the profile have, the more the bot probability increase.

The age of each profile is less important, but can be checked as well. The more the distribution is big, the more the probability will be hight.

Also less important, PegaBot also check for the number of friends of each friends and the number of tweets. These distributions are less important, but still count for the feature score.

A bot can also belong to a bot network and can be friends with all the network, that are also bots with a similar profile. Once again, we cannot base our doubts only on this feature, that is why we will see another feature now!

3 - Rede (Network)

Similar to Amizades, this features is very long to calculate for an human (Human: 1 — PegaBot: 2). The goal of this features is to analyze who retweet, mention or like each tweet of the suspected profile and to apply a similar logic as Amizades for know the distribution of people the profile try to reach, the more the density of people is hight and diversified, the more suspicious it is.

4 - Frenquência (Frequency)

The goal of this feature is to extract and analyze the posting time of each tweet in timeline. Similar but more accurate that what we did in the Usuário feature for the tweets per day.

Start by get the date and time of some tweets in the timeline of the suspected profile (can be 10 tweets, 50, 100, 1 000… Human: 1 — PegaBot: 3), list them and analyze them, does it looks natural? For example if two tweets was posted in a really short time, this is very suspicious, specially if it happens more than one time. Also we can see, does it looks scheduled? If each tweets are posted always in a timelaps of, for example, one hour exactly, it is very weird for an human

5 - Publicações (Published)

Still in the tweets timeline, we need now to see at the published text, does the suspected profile post always the same tweet, or it is always a similar meaning?

Can we cut the text in some parts of speech, and find this same part in a lot of tweets? This is more easy for an human to detect this, however, if a lot of tweet are posted, it can be very long, or impossible (Human: 2 — PegaBot: 4)

6 - Emocional (Emotion)

We process to the same analyze that Publicaçoes, but this time we care about the emotion of the tweet, that is definitely more easy for an human to detect than a machine (Human: 4! — PegaBot: 5)

The more the emotions used in tweets are different (happiness, excitation, rage…) , and the more these emotions are on different form, the more we can see the humanity of the profile.

Detect the emotions in a sentence (can be a global emotion of the sentence, beyond only one word), is very hard for an algorithm.

Conclusion

Basically, both human and computer algorithm can make a probability about a bot, the problem is bots logic always evolve and become more undetectable to algorithm, when an algorithm is good for mass data analysis data and mathematics. Human is better for feel and make an opinion about the context of a single profile. With training, an human can become better and better and know the behavior of every bots on any social network.

All bots are not bad or dangerous, some try to change the opinion and influence people by spreading fake news for example but some bots are also here for tweet lovely poems, or for detect others bot…

The most important is not to know who is a bot, but to be aware that social medias and Internet in general have a lot of wrong informations.

We do not have to stop using them, but we need to be more suspicious about what we see on Internet. Do you trust anyone on the street? Do you trust someone who wear a mask for hide its identity?

You probably learned since child that is not a good idea! Internet is not more reliable than the street.

The goal of PegaBot is not only to simply allow people to find how much an account is probable to be a bot. But to make people aware that bots are existing.

This year we will have elections in Brazil, and thinking about the impact of digital influence on the political debate, this project idealized by ITS (Instituto de Tecnologia e Sociedade) and IT&E (Instituto Tecnologia & Equidade) , aims to raise the question about which profiles can be a bot, cyborg or human.

If you did not try PegaBot yet, you can find it here https://pegabot.com.br/

You can also follow us on Twitter https://twitter.com/pegabots

And follow the github project on https://github.com/AppCivico/pegabot

PegaBot was realized with the help of Botometer Project https://botometer.iuni.iu.edu that belong to Indiana University Network Science Institute (IUNI) and the Center for Complex Networks and Systems Research (CNetS)

--

--