Using Machine Learning to Fight Cyber Trolls

Automatically flagging Cyber-Bullying/Cyber-Aggressive Comments in user-generated content

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out!

This blog describes a supervised machine learning approach to automatically flag trolls and cyber-aggressive comments by peers on social media platform (twitter).

The models created use Doc2Vec, an unsupervised learning algorithm for feature extraction for generating “paragraph vectors” which are subsequently fed to machine learning algorithms. Experimentation in this direction shows that an SVM with rbf kernel gives 88.465% accuracy in labeling unseen tweets. Our framework is designed to extract 200 recent tweets of a user logged into twitter and automatically label them. Such a framework could be used to identification and moderation of trolls and aggressive content in tweets/comments on any social media platform.

Cyber-bullying refers to sending, posting, or sharing negative, harmful, false, or mean content about someone else through messaging, e-mail or other technical means.

One of the major challenges in identifying cyber-bullying or cyber-aggressive comments is to detect a sender’s tone in a particular text message, email or comments on social media, since what a person may consider to be a joke, may act as a hurting insult to another.

Nevertheless, cyber-bullying may prove to be non-accidental in specific cases where a repetition in the pattern of text in emails, messages, and online posts is existent.

A 2016 report from the Cyberbullying Research Center indicates that 33.8% of students between 12 and 17 were victims of cyber-bullying in their lifetime
A study by McAfee, found that 87% of teens have observed cyberbullying.

Cyber-bullying, if unreported by a person, may lead to Withdrawal, seclusion, avoidance of social relationships, Poor academic performance, Bullying others — to feel in control and In extreme cases — Suicide.

ML Model

We use four documents for training and testing purposes which are described below:

  • Train_CyberBullying_Dataset.csv: 5317 Cyber Agressive Comments as Training Data
  • Train_NonCyberBullying_Dataset.csv: 15328 Non Cyber Agressive Comments as Training Data
  • Test_CyberBullying_Dataset.csv: 2505 Cyber Agressive Comments as Training Data
  • Test_NonCyberBullying_Dataset.csv: 6312 Non Cyber Agressive Comments as Training Data

The data contained in these Files have been collected from Kaggle and other sources and have been segregated according to their labels. The total data was split into 20845 training data and 8817 testing data which were further split into CyberBullying and Non CyberBullying categories into their respective files.

Each of the files should be formatted as such:

Nope. I know a little thai and some spanish. haha go figure.Im seriously the whitest korean chick you'd ever meet :]
Idk I think it swings both ways pretty strongly. The human race is never satisfied and I believe that both sexes have
the tendencey to cheat. At all ages and for all reasons.
Timmy. He held me down and tickled me
im not the one who called you a whore either haha. i play soccer and run track and did dance for eight years and did
What would a chair look like if your knees bent the other way?
formspringg haha boyfriend. Were always out staying busy.
Would you rather be rich or famous?
Haha youu need a phone !(:
you should ask her if shes still
r Has anybody ever told you that!

The sample up there contains 10 non cyber aggressive comments, each one taking up one entire line. Yes, each comment should be on one line, separated by new lines. This is extremely important, because our parser depends on this to identify sentences.

The raw data is provided here.

Snapshot of the Dataset


Initial setup for training our model on the text data requires installation of the gensim library for feature extraction and sklearn module for machine learning models.

The inability of machine learning algorithms to process raw text directly is a keen issue in the field of natural language processing. This brings out the necessity for numerical representations of linguistic units, for the purpose of which several standard feature extraction techniques such as Bag-of-Words, n-grams, etc. Though these models have been shown to be considerably effective and are the state-of-the-art models for generating vector representations for text, yet these models do not take into account the order of words in a sentence, which is an important parameter upon which the detection of cyber-aggressive comments is dependent.

In order to tackle this issue of obtaining a favourable feature matrix for the task, we propose the incorporation of document embeddings or paragraph vectors generated through Doc2Vec which is an unsupervised learning algorithm to effectively generate semantic vector representation of comments and paragraphs which fits our purpose, as we deal with multiple line comments as well, taking into consideration, the word order and context for effective learning.

The sentences are tokenized and each set of tokens is associated with a paragraph id or tag before training, indicating the document type the sentence comes from as follows. The tags used are for convenience and follow no specific convention as such.

Pre-processing of data before feature extraction

This aggregation of tagged tokenized comments from training and testing pairs of datasets are shuffled randomly for better training and eliminating any dependency on the order of feeding the input, are then fed to the Doc2Vec model for training and subsequent generation of feature vectors.

We extract the feature vectors for training and testing data into separate arrays with their corresponding labels in separate arrays and feed them to machine learning models for training and subsequent prediction.


The jupyter notebook file containing the fully documented code of the trained model is provided here.

The accuracy obtained on testing the models on the test data is as follows:

Results after evaluating models comparing the accuracy score and k-fold cross validation score

Further details of evaluation of the models are provide below :

Comparison of models based on precision, recall, F-score and Area under ROC Curve

This shows that an SVM model works best though with more time consumption, while a random forest model is faster though it has a slight tradeoff with the accuracy.

A glimpse of the twitter app framework we built to analyse tweets of a user upon logging on to twitter is shown below.

The recent 200 tweets of a user are extracted upon login and the tweets are labelled automatically by our above model as neutral or cyber-aggressive.

Such a tool could be useful for parents to monitor their children from becoming bullies online and for social media analysts to identify a potential bully.

Here is the user dashboard upon login:

Prediction results generated automatically by our trained models:

If you have any queries or suggestions, I would love to hear about it. Please write to me at

Kindly cite our Research Paper based on this project if you extend our work or refer to our research for any purpose. The Research Paper can be found here.


This project is part of a research conducted at the Center for Data Sciences and Applied Machine Learning, PES University, Bangalore, by Abhishek Narayanan (intern at Dataturks), under the guidance of Dr. Shylaja S S (Chairperson,Dept of CSE, PES University, Prof & Head of the Dept CS & IS,PESIT,Bangalore) along with two students, Abhijith Venugopal and Abhishek Prasad.

Shameless plugin: We are a data annotation platform to make it super easy for you to build ML datasets. Just upload data, invite your team and build datasets super quick. Check us out!