Fine Tunning BERT Model for Amazon Product Review and Deploying it into Hugging Face Model Hub:

9 min readSep 27, 2023

In this blog post, we will cover the steps to fine-tune a BERT model for sentiment analysis on Amazon product reviews and subsequently deploy the trained model to the Hugging Face Model Hub.

Setting Up the Environment

To start, let’s first install the required packages:

%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet
%pip install \
    transformers==4.27.2 \
    datasets==2.11.0  --quiet

Data Loading and Pre-processing

After importing necessary libraries, we’ll load our Amazon product reviews dataset from a JSON gzip file:

import pandas as pd
import gzip
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from transformers import BertTokenizerFast, BertForSequenceClassification, Trainer, TrainingArguments
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset
import torch
import numpy as np

file_path = './AMAZON_FASHION.json.gz'

with gzip.open(file_path) as f:
  df = pd.read_json(f, lines = True, nrows= 100000)

We then preprocess the data by selecting relevant columns, filtering out unverified reviews, handling missing values.

df= df[['overall','verified','reviewTime','reviewText']]
df= df[df['verified'] == True].reset_index(drop=True)
df.dropna(subset=['reviewText', 'overall'], inplace=True)
df['overall'] = df['overall'] - 1  # Adjust ratings to start from 0

Splitting the Dataframe into Train and Test Datasets

After preprocessing the data, the next step is to split our dataset into training and test sets. This allows us to evaluate our model’s performance on unseen data. We pick the first 1000 samples for the test set and the rest for the train set. Here’s how you can achieve this:

df_test = df.iloc[:1000, :].reset_index(drop=True)
df_train = df.iloc[1000:, :].reset_index(drop=True)

Converting Pandas Dataframe to HuggingFace’s Dataset Format

Before tokenizing our data, it’s essential to convert our pandas dataframe into a format that’s compatible with HuggingFace’s tools. Fortunately, the datasets library provides an easy way to do this:

train_dataset = Dataset.from_pandas(df_train)
test_dataset = Dataset.from_pandas(df_test)

Tokenizing the Data

Tokenization is the process of converting text into a sequence of tokens or integers that can be fed into a model. For this task, we’re utilizing BERT’s tokenizer. Here are the parameters we’re using:

Parameters:

padding = True: This ensures that all tokenized sequences (lists of token IDs) are of the same length. Any shorter sequences are padded with zeros until they match the length of the longest sequence in that particular batch. Padding is essential because deep learning models expect input data to have a consistent shape.
truncation = True: This ensures that any sequence exceeding our defined max_length will be truncated (or shortened) to that length. Given that BERT, especially the base model, has a maximum positional embedding size, it's crucial to ensure our tokenized sequences don't exceed this length.
max_length = 128: This is the maximum length of a sequence after tokenization. If a text has more tokens than this number after tokenization, it will be truncated to this length. On the other hand, if a text has fewer tokens, it will be padded up to this length (because of the padding=True parameter). The choice of 128 is somewhat arbitrary, but it's a reasonable default for many tasks. You can adjust it based on the distribution of token lengths in your dataset and the computational resources available.
return_tensors = ’pt’: This means the tokenized sequences will be returned as PyTorch tensors ('pt' stands for PyTorch). If you're working with TensorFlow, you would use 'tf' instead. Returning tensors means the output is ready to be directly fed into a model without any further transformation.

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

def tokenize(batch):
    tokenized_inputs = tokenizer(batch['reviewText'], padding=True, truncation=True, max_length=128, return_tensors='pt')
    tokenized_inputs["labels"] = torch.tensor(batch['overall'])
    return tokenized_inputs

train_dataset = Dataset.from_pandas(df_train).map(tokenize, batched=True)
test_dataset = Dataset.from_pandas(df_test).map(tokenize, batched=True)

Setting the Format for the Datasets:

Once tokenization is complete, it’s essential to set the format for our datasets. This step ensures that the datasets have a consistent structure and only include the necessary columns when fed into our model:

train_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])
test_dataset.set_format('torch', columns=['input_ids', 'attention_mask', 'labels'])

Fine-Tuning the Model

Fine-tuning is the process of further training a pre-trained model (like BERT) on a new dataset. Rather than training from scratch, we leverage the knowledge that the model has gained from a massive dataset and adjust its weights based on our specific dataset.

1. Initializing the Model:

# Initializing the model
model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=len(np.unique(df['overall']))
)

Here we are using AutoModelForSequenceClassification.from_pretrained to load a pre-trained version of the BERT model ('bert-base-uncased'). This particular model version uses lowercase tokens and is one of the foundational BERT architectures.

bert-base-uncased: The “base” version of the BERT model, trained on lowercase English text (hence “uncased”).
num_labels: Specifies the number of unique labels in our dataset. This tells the model how many output units it needs in the final layer, which is essential for classification tasks. For instance, if we have 5 unique ratings in the dataset, the model will output a vector of length 5 for each review, with each value indicating the probability of that review belonging to a particular rating.

2. Setting up Training Configurations:

training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=128,
    per_device_eval_batch_size=128,
    learning_rate=1e-5,
    weight_decay=0.01,
    logging_dir='./logs',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=10,
    fp16=True
)

The TrainingArguments class from Hugging Face’s Transformers library lets us set several training configurations:

output_dir: Directory where the model checkpoints and results will be saved.
num_train_epochs: Number of training epochs. An epoch is one full pass over the training data. I trained for 3 epochs on a free T4 GPU on Google Colab.
per_device_train_batch_size & per_device_eval_batch_size: Batch size for training and evaluation. A smaller batch size often provides more regular updates, but a larger batch size tends to stabilize training and utilizes GPU memory better.
learning_rate: The speed at which our model updates its weights based on the training data.
weight_decay: Regularization technique. It adds a penalty to the loss function, discouraging large weights, which can cause overfitting.
logging_dir: Directory for storing logs.
evaluation_strategy & save_strategy: How often the model is evaluated and saved. Here, it’s set to do both at the end of each epoch.
load_best_model_at_end: If set to True, the model weights will revert to the checkpoint with the best metric at the end of training.
logging_steps: Log metrics every logging_steps steps.
fp16: Use mixed precision training. It can speed up training and reduce memory usage.

3. Initializing the Trainer:

Here, we initialize the Trainer class, which will handle the training process. We pass the model, training arguments, datasets, and the metric computation function to it.

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
)

4. Training the Model:

This step starts the training process based on the configurations set earlier. The trainer will use the training dataset to fine-tune the model and will evaluate it using the evaluation dataset.

trainer.train()

Model Evaluation

After training, it’s essential to evaluate our model’s performance. We’ll use several metrics like accuracy, precision, recall, Mae, F1 score, and ROC AUC.

# Function to compute metrics
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions

    # Hard predictions are needed for accuracy, precision, recall, and F1
    hard_preds = np.argmax(preds, axis=1)

    precision, recall, f1, _ = precision_recall_fscore_support(labels, hard_preds, average='weighted')
    acc = accuracy_score(labels, hard_preds)
    mae = mean_absolute_error(labels, hard_preds)

    # Compute ROC AUC for each class
    roc_auc = {}
    for i in range(preds.shape[1]):  # Iterate over each class
        roc_auc[f"roc_auc_class_{i}"] = roc_auc_score((labels == i).astype(int), preds[:, i])

    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall,
        'mae': mae,
        **roc_auc  # This will expand the dictionary to include the roc_auc for each class
    }
# Evaluating the model on the test dataset
trainer.evaluate()

Deploying to Hugging Face Model Hub

Now, for the exciting part — deploying our trained model to the Hugging Face Model Hub!

The modern NLP ecosystem is teeming with pre-trained models. But how do you share your own trained models with the world? Here’s an elaborate walkthrough on how to upload your BERT model for sentiment analysis to the Hugging Face Model Hub.

Deep diving into the intricacies of Natural Language Processing (NLP), one can’t help but admire the capabilities of models like BERT. After investing hours in training a BERT model on Amazon product reviews, you’ve gleaned insights that others might benefit from. Sharing is, after all, the essence of the open-source spirit. And what better platform than Hugging Face?

1. Set the Stage: Hugging Face Authentication

If you’re new to Hugging Face, first create an account.
Next, seamlessly blend your machine’s environment with Hugging Face:

!huggingface-cli login

2. Craft Your Repository

A new repository is like an artist’s blank canvas:

!huggingface-cli repo create your-model-name --type model

Here, replace your-model-name with something artful and descriptive. For example I picked "BERT_Amazon_Review".

3. Save your model and tokenizer:

I am saving my model in the current directory (./ ).

model.save_pretrained("./")
tokenizer.save_pretrained("./")

4. Push to the Hub: Launching your Masterpiece

Before you make your model public on the Hub, there are some technical steps you should be aware of. This ensures that large files, like your model weights, are properly managed, and your identity as the author remains intact.

Enter your model’s directory:

cd your-model-name

#create git repository
!git init

4.1. Using Git Large File Storage (LFS)

Models, especially like BERT, can be large and exceed typical Git file size limits. Here’s where Git LFS comes into play. Git LFS automatically handles files larger than 10MB. But for very large files (>5GB), you need to install a custom transfer agent for Git LFS:

!huggingface-cli lfs-enable-largefiles .

!git lfs track "pytorch_model.bin"

With this command, you’re telling Git to track pytorch_model.bin (your model weights) using LFS. This will ensure efficient handling of this large file.

4.2. Add Your Files to Git

Now that your model is being tracked by LFS, you need to add it along with other necessary files:

!git add config.json pytorch_model.bin .gitattributes

Here:

config.json: This contains the configuration of your model.
pytorch_model.bin: The weights of your trained model.
.gitattributes: Ensures that LFS tracking is set up properly.

4.3. Set Up Your Git Identity

Before committing, ensure that Git knows who you are:

!git config --global user.email "YOUR_EMAIL"
!git config --global user.name "YOUR_USER_NAME"

Replace YOUR_EMAIL with your email and YOUR_USER_NAME with your preferred username. This configures Git to recognize you as the author of the commits.

4.4. Commit and Push

With everything set, it’s time to commit and push:

!git commit -m "Initial BERT model upload"
!git push

By following these enhanced steps, you ensure that your model is efficiently and correctly uploaded to the Hugging Face Model Hub. Large files are seamlessly managed, and your identity as the author is preserved.

5. The Model Card: Your Model’s Story

Every masterpiece has a story. Your model card (a README.md in the directory) narrates:

Model Description: What does your model do?
Training Data: Where did it learn its wisdom?
Training Methodology: How was the training conducted?
Performance Metrics: Brag about its accuracy, F1 score, etc.
Usage Examples: How can someone use it?

Congratulations! 🎉 Your BERT model, fine-tuned for Amazon product reviews, is now available on the Hugging Face Model Hub for anyone to use. So, the next time you train an exceptional model, remember, sharing on Hugging Face hub!

Conclusion and Model Comparison:

Having taken our model through its paces, let’s reflect upon its performance metrics:

Accuracy: 68.3%
F1 Score: 66.13%
Precision: 65.70%
Recall: 68.3%
Mean Absolute Error (MAE): 0.372
ROC AUC (per class): Varies with class 0 having the highest at 96.67% and class 3 having the lowest at 68.58%.
Evaluation Runtime: Approximately 1.975 seconds.

The Mean Absolute Error (MAE) can be interpreted as a measure of how far, on average, your model’s predictions deviate from the actual values in your dataset. Specifically, it quantifies the average absolute difference between the predicted and actual values. Therefore, our model’s predictions are only 0.37 far from the actual review stars on average.

Contrasting these results against OpenAI’s GPT-3.5 (To find more about OpenAI’s result click HERE), our fine-tuned model showcases a commendable performance. While GPT-3.5 is a marvel in the realm of language models, its general-purpose nature means that it might not always be optimized for very specific tasks or datasets. Fine-tuning a model on a particular dataset, as we’ve done, often hones its performance on that specific task, even if it’s at the cost of generalizability.

In conclusion, the act of fine-tuning isn’t just a technical endeavor, but a strategic choice. It allows us to adapt powerful, general-purpose models to cater to the specific nuances and intricacies of our unique data. In our case, the results speak for themselves — fine-tuning has afforded us a more tailored and efficient model for our product review classification task, outperforming even giants like OpenAI’s GPT-3.5.