Week 2 -Hate Speech Detection on Social Media

Gökhan Özeloğlu
bbm406f19
Published in
6 min readDec 7, 2019

Team Members: Gökhan ÖZELOĞLU, Ege ÇINAR, Yiğit Barkın ÜNAL

Introduction

This week, we are going to talk about some related works and our dataset which will be used in our project.

Related Work

We read 3 different articles about hate speech on social media. The first article is, Detecting Hate Speech in Social Media, Zampieri et al.[1], and they applied supervised classification methods on a recently released dataset. The system uses n-grams, word n-grams, and word skip-grams. They obtained 78% accuracy in identifying posts across three classes. The data is divided into three labels as a class and these three are; (1) hate speech(HATE), (2) offensive language but no hate speech(OFFENSIVE), and (3) no offensive content(OK). Most studies have been modeled as binary classification with only one positive and one negative class. Moreover, the presence of profane content does not in itself specify hate speech. Hate speech may denigrate or threaten an individual or a group of people without the use of any profanities. The main aim of this paper is to establish a lexical baseline to identify between hate speech and profanity on this standard dataset. The dataset features 14,509 English tweets annotated by a minimum of three annotators.

Fig. 1: Data is labeled into three labels.

The second article is, Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media [2]. In this paper, researchers created a multiclass, multilabel classifier that considers several categories of online hate. They collected data from major online news and media companies with an international audience. They obtained 137,098 comments from videos posted on YouTube and Facebook, in the period of July-October, 2017. In the dataset, 79,439 (46%) comments are from Facebook, 57,659 (54%) from YouTube. They explore the hate in the dataset by building a simple dictionary based on a) public sources of hateful words and b) a qualitative analysis. Overall, the dictionary of hateful words includes 200 commonly appearing hateful words in this online news media. Searching with that dictionary, they found that 22,514 comments (16.4%) contain these hateful wordings. To further explore the dataset, They ran a topic model based on LDA (Latent Dirichlet Allocation), as commonly done in computational social science.

The taxonomy was developed with the following guidelines in mind: 1) Read through the comments, identify themes and sub-themes. 2) While creating the categories, consider the hate target and the meaning of the comment. 3) When appropriate, apply hierarchy by first labeling the main theme, then a subtheme. 4) When classifying, include purposeful comments, i.e. intentionally hurtful. The taxonomy has 13 main categories and 16 subcategories (29 in total).

Guideline
Online Hate Taxonomy

To achieve the research objective, they built two types of models: 1) binary classifiers that distinguish between hateful and non-hateful comments and 2) multiclass classifiers that provide granular information on hate targets and language.

The feature categories are n-gram, semantic and syntactic, and distributional semantic features. Before feature extraction, preprocessing was performed, removing stop words from comments and stripping the tokens of any trailing special characters or space. The preprocessing was not performed on the semantic features because special characters are essential for the feature computation.

In this feature set, they used token n-grams that range between 1–3 grams. For simplicity, they used the raw term frequency TF for the first set of n-gram features, and frequency-inverse document frequency (TF-IDF) for the second set of n-grams.

Binary Classification Results. Highest F1 Scores Bolded

In this experiment, they used the 5,143 labels annotated using their taxonomy. This dataset was split into training and testing (33% for testing) the classification models. Five different models were used: Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear Support Vector Machine (SVM). For each model, they tuned the parameters using scikit-learn’s grid search method in Python. Moreover, they used pipelining to feed the features to the multilabel classifiers.

The average precision of the best model, SVM, was 0.90.

Multilabel Classification Results. Highest F1 Scores Bolded.

The third article [3] is Detection of Hate Speech in Social Networks: A Survey on Multilingual Corpus. In this paper, they’ve used several NLP methods such as Text Categorization, Role Labeling, Sentiment Analysis, Latent Topic Modeling. In text categorization method, Naive Bayes, SVM with linear kernel (SVM linear ), SVM with RBF kernel (SVM RBF ) and Logistic Regression (equivalent to MaxEnt) with unigrams (g), unigrams+bigrams (g2g), and POScolored unigrams+bigrams (g2gPOS) respectively.

They’ve found out that with the largest training set size (1500), the combination of SVM linear + 1g achieves an average accuracy 79.7%. SVM linear + 1g2g achieves 81.3%, which is significantly better (t-test, p = 4 x 106). It shows that including bigrams can significantly improve classification performance. SVM linear + 1g2gPOS achieves 81.6%, though the improvement is not statistically significant (p = 0.088), which indicates that POS coloring does not help too much on this task. SVM RBF gives similar performance, Logistic Regression is slightly worse and Naïve Bayes is much worse, for a large range of training set sizes. In summary, SVM linear + 1g2g is the preferred model because of its accuracy and simplicity. We also note that these accuracies are much better than the majority class baseline of 61%. On the held-out set, SVM linear + 1g2g achieves precision P=0.76, R=0.79, and F-measure 0.77.

At Role Labeling, paper split the category to 2 sections :

  1. Author’s roles
  2. Person’s mention roles

In Author’s roles: The best combination is SVM linear + 1g2g with cross-validation accuracy 61%. Even though it is far from perfect, it is significantly better than the majority class (R) baseline of 43%. It shows that there is a signal in the text to infer the authors’ roles.

In Person’s mention roles: On the accuracy, precision, recall, and F-1 measure. Accuracy measures the percentage of tokens correctly assigned the ground truth labels, including N (not- a-person) tokens. Precision measures the fraction of correctly labeled person-mention tokens over all tokens that are not N according to the algorithm. Recall measures the fraction of correctly labeled person-mention tokens over all tokens that are not N according to the ground truth. F-1 is the harmonic mean of precision and recall. Linear CRF achieved an accuracy 0.87, which is higher than the baseline of the majority class predictor (N, 0.80) (t-test, p = 10^-10). However, the precision and recall are low potentially because the tweets are short and noisy. CRF outperforms SVM in all measures, showing the value of joint classification.

Dataset

We found some different datasets on the internet. The first one is https://github.com/t-davidson/hate-speech-and-offensive-language. This dataset includes labeled data from Twitter. It has 25,297 different tweets. It is labeled into three labels. These labels are hate speech(0), offensive language(1), and neither(2). The second dataset is https://github.com/zeerakw/hatespeech. It includes 16,907 tweets that are labeled as racism, sexism and none. But this repository does not include the dataset. There are just tweet IDs and their labels, so, I found the data in this repository https://github.com/wvs2/data-hate/tree/master/wassen. We are going to combine them with the preprocessing step. The third dataset is https://github.com/ENCASEH2020/hatespeech-twitter. It includes ~100K tweets and it is labeled as abusive, hateful, normal and normal. The fourth dataset is https://github.com/aitor-garcia-p/hate-speech-dataset. This repository includes dataset and also includes the version of divided into train-test. The dataset is from Stormfront, a white supremacist forum. The posts are labeled as hate speech or not. There are 10,944 different data. We are going to combine the dataset with its label on the preprocessing step. Maybe we will extend our datasets with other languages like German, Spanish, Italian and so on.

References

[1] Detecting Hate Speech in Social Media, arXiv:1712.06427v2 [cs.CL]

[2] Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media Joni Salminen, Hind Almerekhi, Milica Milenković, Soon-gyo Jung, Jisun An, Haewoon Kwak, Bernard J. Jansen

[3] Detection of Hate Speech in Social Networks: A Survey on Multilingual Corpus, Areej Al-Hassan and Hmood Al-Dossari, King Saud University, Saudi Arabia

Previous weeks

Week 1 — Hate Speech Detection on Social Media

--

--