A Data Science project: Toxic comments classification using Naïve Bayes & Logistic regression algorithm.

9 min readMar 24, 2022

With the development of social media and online interactions, fighting against toxicity and harmful comments have become a goal for companies that promote connection with people.

Video gaming studios like Ubisoft, streaming services like Twitch, or even social media companies like Facebook or Twitter, invest a lot of resources to detect automatically toxic messages and content.

Detecting this kind of comment improves the user experience. Therefore there is an economic interest for companies. Start-ups have also understood this business need and provide fitted solutions to filter toxic comments, such as Bodyguard.Ai (a french start-up).

For all of these reasons, I have decided to tackle this project and to create a Machine Learning algorithm capable of detecting harmful comments, using Naïve Bayes-Logistic Regression. The model shows good results with a 97,5% accuracy in average. We will discuss this result at the end of this article.

The data

To train this model, we need a dataset already labeled, showing if a comment is toxic or not. For this, Kaggle is our best friend!

Toxic Comment Classification Challenge on Kaggle

4 years ago, a Kaggle competition was created by Jigsaw and Google (two entities from Alphabet) to improve their existing algorithm, with a 35,000$ cash prize!

It was provided for this competition a set of data containing various comments from Wikipedia’s talk page edits, which have been labeled by human raters for toxic behavior.

Each comment has been labeled under 6 types of toxicity :

toxic
severe_toxic
obscene
threat
insult
identity_hate

As we can see, some comments are highly toxic. Our objective will be to detect this type of message with our model. We spot also some typos and non-English words (“D’aww!”).

Text preprocessing

As an NLP project, we have to clear the text to be transformed into Word Embedding Vectors. We will talk about that later.

To obtain a text that can be processed easily by the machine, we have decided to :

remove hyperlinks
remove words containing numbers
lowercase words
remove stopwords (at, not, and, it, …)
remove punctuations

The last one can be discussed. Indeed, in Sentiment Analysis, like we are doing right now, punctuations can have a major influence. For instance the exclamation mark: “!”.

def preprocess(corpus):  '''
  From a string, make text lowercase, remove hyperlinks,
  punctuation, word containing numbers, stopwords.
  Input : a list of strings
  Output : a list of tokens stored in a generator (yield)  '''  for text in corpus:    # Lowercase
    text = text.lower()
    # Remove links                  
    text = re.sub(r'https?://[^\s\n\r]+', '', text) 
    # Remove punctuation
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    # Remove words containing numbers
    text = re.sub('\w*\d\w*', '', text)        # Return a generator
    yield ' '.join([word for word in text.split(' ') 
                      if word not in stopwords_english])

From a raw comment, we obtain a cleaned version!

preprocess(comment)

Before preprocessing

After preprocessing

We note also that we removed some meanings by withdrawing stopwords (not for instance). Keeping punctuations like “!” or stopwords could be a good alternative to try out.

We apply this preprocessing to the entire dataset :

# We save the cleaned comments in a list to be easily manipulated
clean_comments = list(preprocess(train_data['comment_text']))

The corpus (the list of comments) is ready to be transformed into word embeddings vectors!

Words embedding

Words embedding consists of transforming a text into a numerical form. In this way, the machine is able to process the data and execute algorithms. This is the foundation of Natural Language Processing.

It is also important to mention that the machine doesn’t understand words, technically speaking. However, the chosen method used to create word embedding (Bag of Words, Continous Bag of Words, TF-IDF,…) defines the signification of either the complete text, each sentence, or each word.

For this project, let’s use the Bag-Of-Words (BOW) technique. It is important to note that choosing a different approach could change the final result!

The concept of BOW is very simple :

Each individual word in the entire corpus is extracted and grouped in a list, in a certain order (alphabetical for instance). Let’s call this list the vocabulary.
For each comment, we define the frequency of each word (number of times a specific word appears in the comment) in respect of the vocabulary. Therefore we obtain a vector popularized with non-null numbers and zeros, in the same order as the vocabulary. This kind of vector is called a sparsed vector, in which most of the elements are 0.

Let’s create this matrix for the train and the test sets, using CountVectorizer :

# Bag-of-wordsfrom sklearn.feature_extraction.text import CountVectorizervectorizer = CountVectorizer(min_df=3,max_df=0.9) #Filter words that are note present at least in min_df documents & no more that 90% of all documents# Return a document-term matrix (n_samples,n_features)
bow = vectorizer.fit_transform(clean_comments) # We do the same for test set, we just transform to have the same number of words
bow_test = vectorizer.transform(test_clean_comments)

We have obtained a sparsed matrix, composed of the list of comments (rows), and vocabulary (columns).

That’s it! Easy right?

Now the machine can process the corpus, and learn from it!

You guessed it: time to do some machine learning!

Naïve Bayes & Logistic Regression

Our goal is to recognize if a comment is toxic or not. We have a dataset filled with comments and 6 classifications of toxicity, labeled by people.

To define if a comment is toxic or not, we use the Naïve Bayes theorem:

Naïve Bayes formula

Representation of the intersection of two domains

Naïve Bayes theorem can be resumed as: the number of A elements knowing there are in B is equal to the intersection of A & B divided by the number of elements in B.

Applied to our project, here’s how it works :

For each label (“toxic”, “insult”,..), we calculate the frequency a word from the vocabulary appears in toxic labeled comments, but also when it is not. Therefore, for each word in the vocabulary, we have two frequencies for each label. This can be written as:

with wᵢ representing a word and label if a comment is toxic or not (1 or 0)

Then we sum frequencies for a specific label (1 and 0), representing:

Finally, we can calculate the probability of a word being toxic or not for each word. The following image shows an example of this representation:

Sentiment analysis using Naïve Bayes (from DeepLearning.AI course)

With this matrix, we can calculate the likelihood of a word being toxic or not, which is the fraction between the probability a word is toxic and not toxic:

As we can see on the previous image, if P(wᵢ|0)=0, there is a problem: we can’t divide by 0…

For this reason, we use Laplace smoothing, with V the number of unique words in the corpus:

Laplace Smoothing in the Naïve Bayes machine learning algorithm

In Python, we can write a function that returns these probabilities considering if a comment is toxic (cat=1) or not (cat=0), using the Bag-of-Words representation (bow) and the target (vector of 1 & 0 labeled by human raters).

def probNB(bow,target,cat):'''Naive Bayes probability for each wordInputs :
bow : bag of words (with doc in rows and words in columns)
target : classification vector (filled with 1 and 0)
cat : 1 or 0, in targetOutput :
Vector of Naive Bayes probabilities with smoothing (n_words,1)'''p = np.array(bow[target==cat].sum(axis=0))
return np.transpose((p+1) / (p.sum() + bow.shape[1]))

Now, with this matrix, we can define if a comment is toxic or not!

To do so, we just calculate the product of all words likelihood, with m the number of words contained in the comment:

Formula to get the likelihood of a comment

However, to avoid storage issues due to the product of small numbers, we instead calculate the log-likelihood. Furthermore, it would be a better way to use linear algebra with matrices and vectors (the logarithm of a product becomes the sum of logarithms):

We create a function that returns the log-likelihood of each unique word (log). Then, using log, we calculate the log-likelihood of a comment. With this, we train a Logistic Regression model for a specific label (target).

def get_model(bow,target):'''
Function that return a trained Logistic regression model & the log likelihood of each word
Inputs :
bow : bag of words (n_doc,n_words)
target : classification of comments (n_doc,1)
Output :
Return a vector of Log Likelihood for each comment (Naïve Bayes) (n_doc,1)
'''log = np.log(probNB(bow,target,1)/probNB(bow,target,0))
m = bow.dot(log)
model = LogisticRegression().fit(m,target)
return model , log

Then we train the model for each label:

target.columns

df_classification = pd.DataFrame() #We store probabilities into a Dataframedf_classification['Comments'] = test['comment_text']for i,j in enumerate(target.columns):
  print('fit', j)
  model,log = get_model(bow,target[j])
  df_classification[j] = model.predict(bow_test.dot(log))  #Accuracy
  score = model.score(bow_test.dot(log) , test[j])
  print(f"Accuracy : {score:.4}")
  print('----')

As we can see, regarding each label individually, our model accuracy is not bad at all.

However, the model can be improved as the following picture shows us:

Some examples of comments where our model didn’t do well

Also, we have to consider that our model trained on the training dataset provided by the competition organizers, and we tested it on the test data set provided as well.

But if we look at the train set, we notice that the number of non-toxic and toxic comments is highly unbalanced :

# Let's define target, which is the classification made by humantarget = train_data[['toxic', 'severe_toxic', 'obscene', 'threat','insult', 'identity_hate']]target = np.array(target) #transform dataframe into arrayprint(target.sum(axis=0) / target.shape[0])

Percentage of toxic comments in the train set

Same for the test set:

keys = ['toxic', 'severe_toxic', 'obscene', 'threat','insult', 'identity_hate']test[keys].sum(axis=0) /test.shape[0]

Percentage of toxic comments in the test set

Therefore, the accuracy we have calculated previously should be taken with caution!

However, to test the efficiency of our model, we have decided to implement the notebook into the Kaggle competition and submitted it. We have obtained a score of 0.84, which is relatively correct. But adjustments can be made!

We have to note that the best model obtained a score of 0.98856. There is still a lot of work to do!

Conclusions

In this article, it was presented a machine learning method to classify and filter toxic comments thanks to a training data set available on Kaggle, where comments were labeled into 6 categories of toxicity by human raters.

This kind of project is nowadays highly in demand by companies because harmful comments can create a bad user experience on social media, like Twitch or Twitter, or even on online games.

For this project, we use the Naïve Bayes Machine Learning Algorithm, coupled with a Logistic Regression model, to classify comments.

We obtained good results overall, and a score of 0.84 on the Kaggle competition.

However, this model can largely be improved, and it will be resumed to obtain a higher score.

I hope you enjoyed reading this article!

Jérémy

Acknowledgments

This project notebook can be found on my Github.

Also, here’s the link to the notebook submitted to the Kaggle competition.

This work was a way for me to apply my learnings to a real business case. It was inspired by the great work of Jeremy Howard. Honors are for him!

Don’t hesitate to ask your question, or let me a comment. I will be glad to look at them!