BERT model for Classification Task. Ham or Spam Email?

Published in

Artificialis

6 min readJan 18, 2023

BERT, which stands for “Bidirectional Encoder Representations from Transformers,” is a state-of-the-art natural language processing (NLP) model developed by Google. BERT is designed to understand the context of a given text by analyzing the relationships between the words in a sentence, rather than just the individual words themselves.

One of the key innovations of BERT is its use of a transformer architecture, which allows the model to process input text in a parallel, rather than sequential, manner. This allows BERT to analyze a sentence as a whole, rather than breaking it down into individual words or phrases.

Another important feature of BERT is its ability to perform “bidirectional” analysis, meaning that it takes into account the context both to the left and to the right of a given word. This allows BERT to better understand the meaning of a word in a given sentence, as it can analyze the words that come before and after it.

TRAINING

BERT has been trained on a massive dataset of over 3 billion words, which allows it to understand a wide range of natural language text. This makes it well-suited for a variety of NLP tasks, including question-answering, sentiment analysis, and language translation.

In recent years, BERT has been used in a number of different applications, including search engines, chatbots, and automated customer service systems. It has also been used to improve the performance of other NLP models, such as those used for named entity recognition and part-of-speech tagging. Its use of a transformer architecture and bidirectional analysis makes it particularly well-suited for understanding the context of a text, which is crucial for many NLP applications.

DATASET

The Ham or Spam dataset is commonly used for training and evaluating ML models for the task of spam filtering. The dataset consists of a collection of email messages, some of which are labeled as ‘ham’ — legitimate or non-spam emails, and other labeled as “spam”. Once, the model has been trained, it can be used to classify new, unseen emails as either ham or spam.

EXPERIMENT

First, we are going to import modules: TensorFlow, TensorFlow Hub and Pandas.

import tensorflow as tf
import tensorflow_hub as hub
import pandas as pd

Now we want to see the head of the DataFrame, first five rows, as we want to check out what kind is the data that we’re experimenting on:

df= pd.read_csv('SMSSpamCollection', sep='\t',
                           names=["label", "message"])
df.head()

Data is labeled and after running df.head() you should get the output which comprises a two columns, one is the label if the email is ham or spam, and the other one is the message:

Let’s quickly rename the columns:

df.rename(columns = {'label':'Category', 'message':'Message'}, inplace = True)
df.head()

… and explore our dataset further with the describe() function

df.groupby('Category').describe()

This is very informative, now we can see the count, how many unique messages, also the frequency of a ham or spam type of messages. We can now perform python lambda function and organize labeling 1 for ham and 0 for spam messages, as we plan to input the labeled dataset to the BERT model, our data should be in numbers for further classification.

df['spam']=df['Category'].apply(lambda x: 1 if x=='spam' else 0)
df.head()

If we take a look now the head of our dataset we will notice the spam column has the value of 0 or 1

After we finished with preprocessing the data, we can divide dataset on training set and test set using train_test_split algorithm from sckit-learn model_selection, also we will stratify the spam colum. :

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df['Message'],df['spam'], stratify=df['spam'])

Now, let’s see how our training dataset looks like:

Wonderful! Our dataset is ready, so now we can go ahead and pip instal tensorflow-text and import KerasLayer from the TensorFlow Hub:

!pip install tensorflow-text
import tensorflow_text as text

bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")

As we imported these two KerasLayers we want to get sentence embedding as a function performed on sentences:

def get_sentence_embeding(sentences):
    preprocessed_text = bert_preprocess(sentences)
    return bert_encoder(preprocessed_text)['pooled_output']

As a result, we can see we’ve got tf. Tensor with a shape=(2, 786), and dytpe=float32.

Next, we will declare BERT layers and construct the Model:

# Bert layers
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string, name='text')
preprocessed_text = bert_preprocess(text_input)
outputs = bert_encoder(preprocessed_text)

# Neural network layers
l = tf.keras.layers.Dropout(0.1, name="dropout")(outputs['pooled_output'])
l = tf.keras.layers.Dense(1, activation='sigmoid', name="output")(l)

# Use inputs and outputs to construct a final model
model = tf.keras.Model(inputs=[text_input], outputs = [l])

Quick check the summary of the model:

We can see the complete architecture, the keras layers we imported from TensorFlow Hub, and our Dropout and Dense layer we added as an output that has (1) unit for classification. Compile and Fit the Model:

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

Since we are dealing with classification we’ve put ‘binary_crossentropy’as a loss funciton.

model.fit(X_train, y_train, epochs=2, batch_size = 32)

Model’s fit on two epochs, where we monitored loss and accuracy. Lastly we will evaluate the model on the test set and try to predict the result:

model.evaluate(X_test, y_test)

#predict
y_predicted = model.predict(X_test)
y_predicted = y_predicted.flatten()
print(y_predicted)

As we got the result in a shape of a tensor we will import the NumPy and use the where method so we can check if the y_predicted is in ham (1) or spam (0) category:

import numpy as np

y_predicted = np.where(y_predicted > 0.5, 1, 0)
y_predicted

CONCLUSION

In conclusion, BERT is a powerful NLP Model that can be used for the classification task. When applied we can see it effectively understands the context of a text, which is crucial for accurately identifying patterns and features indicative of spam. The pre-trained BERT model can be fine-tuned to improve the performance of the model. Furthermore, BERT can be used for other types of text classification tasks such as sentiment analysis (you can check my experiment here:

Sentiment analysis, scoring with BERT quickly

Part one: Linguistical approach to Sentiment Analysis and automation of the analyzing process with Transformers for…

medium.com

also news categorisation -I have a blog post about that, too :

One example of scraping with Beautiful Soup

part two of the Sentiment Analysis with BERT experiment with scoring on scraped data from the web — Yelp reviews with…

medium.com

The use of BERT model for spam filtering can significantly improve the overall efficiency and efectivness of email systems, and it is a promising approach to tackle the text classification problem.

RESOURCES:

TensorFlow Documentation:

Fine-tuning a BERT model | Text | TensorFlow

This tutorial demonstrates how to fine-tune a Bidirectional Encoder Representations from Transformers (BERT) (Devlin et…

www.tensorflow.org

Colab Notebook with the experiment:

Google Colaboratory

Edit description

colab.research.google.com

BERT: Pre-training of Deep Bidirectional Transformers:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations…

arxiv.org