Deep NLP for hate speech detection

Published in

isiway-tech

9 min readDec 20, 2018

Nowadays, as we all well know, the influence of social media and social networks plays a huge role in our society regardless of the country in which we live in. The website below represents a clear example of its incredible growth.

Internet Live Stats — Internet Usage & Social Media Statistics

Watch the Internet as it grows in real time and monitor social media usage: Internet users, websites, blog posts…

www.internetlivestats.com

More than 8,000 tweets per second are posted every day which amount to something like 260 billion tweets per year. This astonishing mass of text encloses an invaluable amount of information from which important insight could be extracted. The process of analyzing text information belongs to the area of Natural Language Processing (NPL).

The goal of this post is to present a practical, and easy to implement, procedure to recognize hateful and discriminatory tweets which this post is aiming to classify. Although there is no universal definition of hate speech, the most accepted once is provided by Nockleby (2000):

‘any communication that disparages a target group of people based on some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics’

This article is organized as follow; Firstly a description of the datasets (obtained combining multiple sources) is provided. Secondly, the fundamental pre-processing steps are pinned down in order to explain the step-by-step processes required to implement the analysis. Last but not least, a simple Neural Network (NN) model is exposed and explained.

Dataset of hate speech tweets

In order to develop a fairly intuitive, though not exhaustive, analysis on the hate speech detection, different types of datasets have been exploited. The first might be found on the following link. Here, six different datasets are freely accessible and widely used for analysis purpose. Those files amount to 55,000 records. Since we are interested in implementing a deep neural network, we may need more data in order to train the model. For this reason, we make use of the additional dataset available at the link. For this dataset, an interesting paper titled ‘Hate Speech Dataset from a White Supremacy Forum’ is available. The data are presented in ‘txt’ format and each tweet is referred to as a single text file. In order to limit the code and solving the task of appending all texts together in a single csv file, we exploited the amazingly simple bash command:

Bash code for appending txt files in a unique csv document

With this single command, we are able to iterate through files and ‘cat’ each of them in a final csv named ‘all_tweets’. The delimiter of each appended line text is defined after the ‘echo’ command and in this case is ‘;…;’ for illustration purpose. Note that it might be safer to encapsulate ‘$i’ between double quotation marks in order to prevent errors in case the filename contains white spaces. If you are curious about more useful bash commands, I suggest to take a look at the following link in which the most common errors bash programmers make are posted as well as the optimal solution.

Moreover, I added the dataset published on Kaggle titled Twitter hate speech. For this dataset, two csv files are present in the downloadable folder referring to the training and testing set respectively. In this case, roughly less than 30,000 unique labeled values (tweets) are present. In order to add useful information to our model, we append it to the end of the other dataset aforementioned.

Finally, always on Kaggle website, there is another dataset called Toxic Comment Classification Challenge which assesses the threat of abuse and harassment on a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. In this case, six types of toxic comments are labeled but we considered just the binarization of them in one variable which describes the presence or absence of abuse comments. Though this is not a twitter dataset, it is added anyway for educational purpose.

Pre-processing… a step toward the cool part

We all well know that before applying any machine learning (ML) model we need to make our dataset ready for a feasible analysis. This step is particularly relevant when we deal with texts. Most words, in fact, are not really helpful to classify aggressive sentences. We what to make life easier for our classifier in order to maximize its performances. Following, we present some easy function that might be applied for a naive cleaning:

Useful methods to clean up texts in the pre-processing phase

Moreover, phrases, per se, cannot be directly interpreted and we need to transform them in a sequence of meaningful integer number in order to be analyzed by our ML model. Likely, Keras has an interesting package called ‘Tokenizer’ which allows us to perform this fundamental pre-processing step with a single command line:

Instantiation of the Tokenizer object

With this simple line of code, we create an instance of the Keras Tokenizer object in which we need to specify only the dimension of our dictionary. In this case, I chose a dictionary of just 10,000 words to keep the analysis simple. Next step, as usual, is to fit and transform the data:

Fit and transform methods for replacing texts with sequences

Notice that we need to train-test-split our dataset before applying these transformations since the test set should be coded coherently as the training set. Lastly, we need to equal the length of each tweet otherwise our NN does not know what to expect. This process is called ‘padding’ and it is easily implemented (thanks to Keras) with a single command:

Padding method for equal length sequences

Notice that ‘pad_sequences’ take as input a list of tweets and we need to specify the max length parameter for equal each phrase. Shorter phrases are expanded adding zeros to the beginning of the transformed vector. For this post, I chose an ‘INPUT_LENGTH’ of 15

Almost done… just one last deadly important step called ‘pre-trained Glove embedding’. Do you feel fuzzy about this concept? Here what Standford’s Group would describe it:

‘GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.’

Do you feel puzzled yet? Easy, I used to either…
Word embedding is just a fancy way to describe the association of each word in your dictionary with a vector of length ’n’ (defined as n-dimensional representation). In essence, each word, and thus each number associated, is linked with a vector. We use this multidimensional vector representation for two primary reasons:

Reason 1 — (and the most important): this n-dimensional vector has been produced by a neural network (NN) model trained on billions of words which goal is to place similar words next to each other in terms of meaning like, for instance, dog and cat or fruit and tree;

Reason 2: increasing the dimensionality of features lead to better discriminating the objective function. This process is called kernel trick. In other words:

some data that are not separable in the original space can become in a higher one

Here the link where to download the Glove embedding dictionary. I made use of the ‘glove.twitter.27B.200d.txt’ provided in the link. Finally, here the code to map your data:

Mapping of the Glove Embedding dictionary to the training vocabulary

A simple Neural Network for hate speech detection

Now the cool part starts…
After the boring pre-processing, the final dataset is ready to be analyzed. Due to the low dimensionality of the dataset, a simple NN model, with just an LSTM layer with 10 hidden units, will suffice the task:

Neural Network model for hate speech detection

Notice that the second layer of the model (the embedding layer) is imposed to be not trainable (trainable=False). This because training an embedding layer is particularly costly both computationally and in terms of accuracy if you don’t have enough data to train your model with. Glove, on the other hand, is trained on billions of data reaching an astonishing final score. We make, then, use of this knowledge for initializing our embedding layer with the ‘embedding_matrix’ loaded in the previous section. Finally, the dropout layer is added in order to avoid (or at least limit) the overfitting problem.
An interesting thing to notice is the ‘class_weight’ parameter in the fit method. Whenever we have an unbalanced dataset, we can either resample the minor class or exploit the build in method provided by Keras. For this analysis, I exploited the second choice, providing a dictionary of weights which defines how many times one class is undersized with respect to the other.
The result of the training phase is shown below:

Model performance: training vs testing loss comparison per epochs

From the graph above we see that the model reaches a good performance. In addition, the table below depicts the classification report for the analysis. It is possible to notice that an 87% total weighted f1-score is reached, though the major contribution is due to the not-abusive class.

Classification Report: Abusive vs Not-Abusive accuracy

Moreover, in order to have a cool representation of the predicted output, we present the python code and the positive and negative word cloud with mask super-imposed:

Python Code:

Code implementation of a WordCloud with a mask super-imposed

Positive predicted tweets

Negative predicted tweets

It is easy to notice that the positive thumbs-up contains words like welcome, may, thing, like, thank, nicer which represent all positive or at least not negative or offensive words to express an opinion. On the other hand, the negative thumbs-down picture embodies evident offensive words.
As a final notice, as long as the Italian sentiment evaluation is concerned, there are no packages that properly deal with the Italian language yet. In order to solve this problem we propose a simple, yet effective, way to evaluate Italian tweets. The procedure is fairly simple: we first translate the sentence in English and then we apply the model to the translated text like in the code below:

With ‘TextBlob’ we translate the English text and we map each word with the index in our saved dictionary:

Conclusion

In this article, a naïve introduction of Deep Natural Language Processing with Keras is provided. First, we saw how to implement the fundamental pre-processing steps needed to clean texts. This part includes ‘‘parsing’’ texts to remove useless words or characters as well as ‘‘tokenizing’’, ‘‘stemming’’ and ‘‘padding’’ them to obtain a feasible dataset to be analyzed. Additionally, we saw the astonishing useful ‘‘embedding’’ procedure which exploits the knowledge offered by Glove. Its goal is to obtain a higher dimensionality vector space in which similar words are close to each other. This knowledge is transferred to the NN model through an embedding-non-trainable-layer in which we simply pass the embedded matrix previously calculated. Finally, after the model has been trained, the results of the training is depicted and the report classification is then presented. For an interesting and nippy representation of the results, a word cloud with a super-imposed mask is shown. From the image, it is clear the prevalence of positive as well as negative keywords used in non-offensive and offensive tweets respectively.
Hope you enjoyed this article… stay tuned for the next post… meanwhile, have fun with ML.