How to fine-tune BERT on text classification task?

Published in

Analytics Vidhya

4 min readJun 7, 2020

Source:- https://pytorch.org/tutorials/_images/bert.png

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based architecture released in the paper “Attention Is All You Need” in the year 2016 by Google. The BERT model got published in the year 2019 in the paper — “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. When it was released, it showed the state of the art results on GLUE benchmark.

Introduction

First, I will tell a little bit about the Bert architecture, and then will move on to the code on how to use is for the text classification task.

The BERT architecture is a multi-layer bidirectional transformer’s encoder described in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

There are two different architecture’s proposed in the paper. BERT_base and BERT_large. The BERT base architecture has L=12, H=768, A=12 and a total of around 110M parameters. Here L refers to the number of transformer blocks, H refers to the hidden size, A refers to the number of self-attention head. For BERT large, L=24, H=1024, A=16.

BERT: State of the Art NLP Model, Explained — Source:- https://www.kdnuggets.com/2018/12/bert-sota-nlp-model-explained.html

The input format of the BERT is given in the above image. I won’t get into much detail into this. You can refer the above link for a more detailed explanation.

Source Code

The code which I will be following can be cloned from the following HuggingFace’s GitHub repo -

https://github.com/huggingface/transformers/

Scripts to be used

Majorly we will be modifying and using two scripts for our text classification task. One is glue.py, and the other will be run_glue.py. The file glue.py path is “transformers/data/processors/” and the file run_glue.py can be found in the location “examples/text-classification/”.

Format of data

The format of the data is something like this. The first column is supposed to be the id column. The second column is supposed to be the column containing the labels. And the third column should contain the text that is required to be classified.

data = pd.DataFrame()
data['id'] = [i for i in range(num_text)]
data['label'] = labels
data['text'] = text

Here num_text is the number of data points to be used, the text is the actual query that is to be classified, and labels are the label associated with its corresponding text. You should save your data in tsv format without headers present in the data.

#if data is your training file 
data.to_csv('train.tsv', sep='\t', index=False, header=None)
#if data is your validation file
data.to_csv('dev.tsv', sep='\t', index=False, header=None)
#if data is your test file
data.to_csv('test.tsv', sep='\t', index=False, header=None)

In your test file, you can ignore the labels column if you want. I had used because it can be used to check models performance after prediction. Also, the name of the files can be kept according to one’s convenience. But accordingly, changes need to be made in the glue.py file by changing the file names.

Changes to be made in the script

glue.py

path — transformers/data/processors/glue.py

For classification purpose, one of these tasks can be selected — CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.

I will continue with the SST-2 task; the same changes can be done with other tasks as well. The class that will be used will be -

Following are the changes needed to be made -

Change the function get_labels()’s return list from [‘0’, ‘1’] to the list of labels present in your data.
In the _create_examples() function, change -

text_a = line[text_index]⬇ ⬇ ⬇(to)text_a = line[-1]

3. In the dictionary defined as glue_task_num_labels, change the value of key ‘sst-2’ to the num_labels present in your data.

run_glue.py

path — examples/text-classification/run_glue.py

Changing this file is optional. Only make changes in this file if you want to save probabilities along with the predictions.

Predictions can be saved by saving the predictions array, which is shown in the above code in a text file.

How to run the scripts

To run the script, one can run the following command -

python ./examples/text-classification/run_glue.py \
    --model_name_or_path bert-base-uncased \
    --task_name $TASK_NAME \
    --do_train \
    --do_eval \
    --data_dir $GLUE_DIR/$TASK_NAME \
    --max_seq_length 128 \
    --per_device_eval_batch_size=8   \
    --per_device_train_batch_size=8   \
    --learning_rate 2e-5 \
    --num_train_epochs 3.0 \
    --output_dir /tmp/$TASK_NAME/

I will explain all the parameters one by one -

— model_name_or_path — This parameter specifies the BERT model type which you want to use.
— task_name — This specifies the glue task which we want to use. One from CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI can be used.
— do_train — Specifies whether you want to fine-tune model or not. If you want to use the pre-trained model, just remove this parameter.
— do_eval — If you want to perform validation
— do_predict — If you want to generate the prediction on test data.
— data_dir — Directory path where training, validation and test data needs to be saved.
— output_dir — Directory path where you want to save the predictions and other generated files.

I think all other parameters are evident by their name. Another setting which can be used is -

— save_total_limit = Specifies how many checkpoint directories you want to save.

All this is sufficient to fine-tune BERT on the text classification task.

That’s all from my side this time. I hope this blog post will help you in completing the specified task.

If you liked my article: