Hateful People or Hateful Bots? Detection and Characterization of Bots Spreading Religious Hatred in Arabic Social Media
This post summarizes a CSCW 2019 paper titled “Hateful People or Hateful Bots? Detection and Characterization of Bots Spreading Religious Hatred in Arabic Social Media” by Nuha Albadi, Maram Kurdi, and Shivakant Mishra.
In our previous work on religious hate speech — hatred that is based on religious affiliation or lack of such — on Arabic Twitter, we found that almost half of the Arabic tweets discussing religion incited hatred and/or violence against religious minorities in the Arab world, mostly targeting Jews, Atheists, and Shia — the second largest Islamic sect. For example, we found that there was about 60% chance that a tweet would be hateful if it contained the Arabic equivalent of one the words: Jew, Jews, and Judaism. Having such a large volume of hate speech and knowing that ISIS and other radical organizations have been using bots to push their extreme ideologies, we hypothesized that bots may be to blame for a significant amount of this widespread hatred.
We identified Twitter accounts disseminating hate speech from the data set constructed in our previous work. We then took a sample subset of these accounts, manually examined each one of them, and assigned a bot-likelihood score ranging from 0 to 5, with 0 being “very unlikely” and 5 being “very likely” based on the extent by which an account exhibited a suspicious bot-like behavior. We show below examples with pictures of some bot-like behavior.
We then used this labeled data set to train a machine learning regression model that can automatically score new accounts from 0 to 5 based on the level of bot-like behavior they show. This model takes into account content and linguistic features (e.g., average length of words and tweets and average number of emojis, hashtags, and URL links per tweet), tweet features (e.g., the proportion of original, reply, and retweet tweets), topic and sentiment features, and account features (e.g., number of followers and friends).
After applying the model on a wider range of accounts with hateful tweets, we found that bots do have a role in the spread of religious hate on Arabic Twitter. In particular, we found that about 11% of accounts with hateful tweets were more likely to be bots. Our topic analysis showed that bots participate in highly controversial political discussions related to Israel/Palestine and Yemen. We also found that Arabic bots can live longer than English bots.
Our feature analysis showed that bots in our data set exhibit distinct behaviors and features. Unlike humans, bots tend to be isolated and not engage in conversations with other accounts. We found that human accounts usually exhibit a variety of behaviors such as replying, retweeting, and tweeting an original text, whereas bots tend to show more of a black and white behavior in the sense that some of them only retweet while others only tweet original tweets. We also found a significant difference in the distribution of topics discussed by bots and humans. Unlike bots, humans tend to discuss a wider range of topics.
We found linguistic features to be highly discriminatory in detecting Arabic bots. We showed that training the model on simple content and linguistic features outperformed Botometer — a bot detection model that relies on language-independent features when it is presented with a non-English tweeting account. This result emphasizes the importance of considering language specific features in bot detection tasks. Important informative linguistic features include the use of numerics and emojis. We found that bots tend to include in their tweets less emojis and more numbers than humans. Other informative linguistic features include the average length of words and the average number of punctuation marks.
The analysis of social media content to understand online human behavior is of a great interest to many researchers. Our findings challenge the assumption often made by such studies that online social media content is always created by humans. We showed that the presence of bots can bias analysis results and disrupt people’s online social experience. Platform designers should increase their efforts in combating malicious bots that compromise online democracy. Data scientists should also account for bots in their studies. In particular, Arabic social media studies that are focused on understanding the differences in behaviors and language use between humans and bots can benefit greatly from our bot detection model.
Paper Citation: Nuha Albadi, Maram Kurdi,and Shivakant Mishra. 2019. Hateful People or Hateful Bots? Detection and Characterization of Bots Spreading Religious Hatred in Arabic Social Media. In Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW, Article 61 (November 2019), 25 pages. https://doi.org/10.1145/3359163