Using XLNet for Sentiment Classification

Shanay Ghag

Published in

The Startup

7 min readJun 16, 2020

Learn how to fine-tune pretrained XLNet model from Huggingface transformers library for sentiment classification.

Introduction

This article is aimed at giving you hands-on experience on building a binary classifier using XLNet. If you are unfamiliar with XLNet, or you need to revise it, I strongly recommend you to give this paper a read. If you are unfamiliar with tranformers, I would suggest you to read this paper, or check out this excellent article by Jay Alammar. You don’t necessarily need to know everything about XLNet or Transformers to follow this article, but the above links would help you if you wish to study it further.

The XLNet model was proposed by researchers in Carnegie Mellon University, and Google AI Brain Team. XLNet is an extension of the Transformer-XL model pre-trained using an autoregressive method to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input sequence factorization order.

XLNet leverages the advantages of both, Auto-regressive and Auto-encoding methods for its pretraining which helps it to overcome pretrain-finetune discrepancy.

XLNet can be used for any specific task easily, by downloading the pretrained model and fine-tuning it for the downstream task. To make our work more easy, Huggingface Transformers have already provided few model classes for performing specific downstream tasks using XLNet. We just need to download and fine-tune them, without writing custom model class i.e additional layers on top of the XLNet model.

About Sentiment Classification

Sentiment Classification is a type of Text Classification problem in NLP. We would be performing Binary text classification. The task is to classify whether a given text/document is portraying positive sentiment or negative sentiment.

By classifying the text based on sentiment, can help us get better insights about the entity, for which the feelings have been expressed.

In this article, I will walk you through the process of building a binary classifier using XLNet for the IMDB dataset. Code for this article is written in PyTorch. We will use XLNetForSequenceClassification model from Huggingface transformers library to classify the movie reviews.

Let’s dig into what are we going to do!

Install and import all the dependencies required to set the code working.
Prepare data
Writing function to perform train step and evaluation.
Fine-tuning XLNet model.
Evaluate performance of the model.
Making predictions on raw text.

Before we begin, entire code used in this article is available in my github repo.

Let’s start by installing and importing all the dependencies

As a saying goes “Tell me and I forget, teach me and I may remember, involve me and I learn”, it would be better to get your hands on code, so that we get a better idea and understanding of the task.

I have trained the model using google colab, and would recommend the same if you don’t have high end computers. We need to download the transformers library, and PyTorch as we are going to write code using them.

!pip install transformers
!pip install torch

Once the installation is done, import the following dependencies:

Preparing Data so that we can feed it to the model

1. Load and preprocess data

We will use imdb dataset from kaggle. It contains 50,000 movie reviews. Download the dataset and store it in your working directory. For faster computation, I have clipped the original data, and used 24,000 movie reviews. You can use more number of reviews according to your convenience. Following are the steps for loading and cleaning the data:

Load the dataset in pandas dataframe.
Remove tagged entities, hyperlinks, emojis from the text.
Convert the labels to numbers. Positive : 1, Negative : 0
Shuffle and clip data if required.

Functions to load and preprocess

2. Encode and Pad data

These pretrained transformer models require input data in tokenized form, with some special tokens added to the original tokens. The steps for bringing the data in required format are:

Using Sentencepiece tokenizer to tokenize the text
Adding special tokens
Creating Attention masks
Padding to max sequence length

All this can be done easily by using encode_plus() function from Huggingface transformer’s XLNetTokenizer. The following code snippet shows how to do it.

from transformers import XLNetTokenizertokenizer = XLNetTokenizer.from_pretrained('xlnet-base-cased')input_txt = "Text data"encodings = tokenizer.encode_plus(input_txt,    add_special_tokens=True, max_length=16, return_tensors='pt', return_token_type_ids=False, return_attention_mask=True, pad_to_max_length=True)

Special tokens added in our task would be:

<sep> end of sentence token
<cls> classification token

In our case we will use single sequence along with special tokens. For example:

single sequence: X <sep> <cls>

The attention_mask is an optional argument used when batching sequences together. This argument indicates to the model which tokens should be attended to, and which should not.

Padding is a technique in which pad tokens are added to get variable length text to fixed length text. The text is usually padded to max sequence length. The encode_plus() function from XLNetTokenizer adds the padding tokens from left i.e pre-padding.

<PAD> <PAD> ... tokenized text ... <SEP> <CLS>

The encode_plus() function returns input_ids, attention_mask and if required token_type_ids.

I have used post padding technique. For post padding I have used pad_sequence() function from keras.preprocessing.sequence and have post padded the input_ids and attention_mask. The following code snippet shows how to do it.

from keras.preprocessing.sequence import pad_sequencesinput_txt = "Input text goes here"encodings = tokenizer.encode_plus(input_txt, add_special_tokens=True, max_length=16, return_tensors='pt', return_token_type_ids=False, return_attention_mask=True, pad_to_max_length=False)attention_mask = pad_sequences(encodings['attention_mask'], maxlen=512, dtype=torch.Tensor ,truncating="post",padding="post")input_ids = pad_sequences(encoding['input_ids'], maxlen=MAX_LEN, dtype=torch.Tensor ,truncating="post",padding="post")

Finally, all these functions are enveloped in a custom dataset class.

3. Custom DataLoader

At the heart of PyTorch data loading utility is the torch.utils.data.DataLoader class. It represents a Python iterable over a dataset, with support for

These options are configured by the constructor arguments of a DataLoader, which has signature:

DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
           batch_sampler=None, num_workers=0, collate_fn=None,
           pin_memory=False, drop_last=False, timeout=0,
           worker_init_fn=None)

We will create Custom DataLoaders for our dataset.

We will split our dataset into training, validation and testing datasets respectively and create dataloaders for them. The three dataloaders would be :

train_data_loader
val_data_loader
test_data_loader

BATCH_SIZE = 4train_data_loader = create_data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)val_data_loader = create_data_loader(df_val, tokenizer, MAX_LEN, BATCH_SIZE)test_data_loader = create_data_loader(df_test, tokenizer, MAX_LEN, BATCH_SIZE)

Writing functions to perform train step and evaluation step

1. Setting up Hyperparameters

Model Hyperparameters are properties that govern the entire training process. Model hyperparameters are set before training (before optimizing the weights and bias).

Following are values of the Hyperparameters that I have used:

Batch size : 4
Epochs : 3
optimizer : AdamW
Learning rate : 3e-5

2. Load pretrained XLNet model

We will be using the XLNetForSequenceClassification model from Huggingface transformer. XLNet Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

We will load the pretrained model along with weights. We need to pass number of classes we want to classify the text in, while loading the model. The following code snippet shows how to do it.

from transformers import XLNetForSequenceClassificationmodel = XLNetForSequenceClassification.from_pretrained('xlnet-base-cased', num_labels = 2)model = model.to(device)

3. Train step function

This function carries out the forward and backward pass, calculates loss as well as accuracy.

Training step function performs following operations:

Load the data from the DataLoader
Pass the data to the model
Calculate loss
Calculate accuracy
Back propagate and update weights

4. Evaluation step function

This function is used to evaluate the model. It calculates and returns loss as well as accuracy.

Evaluation function performs following operations:

Load the data from the DataLoader
Pass the data to the model
Calculate loss
Calculate accuracy

Finally! Fine-tuning the model

We will call the previously defined train and eval functions for a particular number of Epochs. The model is fine-tuned on our dataset, and the model with best validation accuracy is saved.

Following were the results I achieved after fine-tuning the model for three epochs:

Training Accuracy : 98.5%
Train Loss : 0.086
Validation Accuracy : 94.03%
Validation Loss : 0.377

Disclaimer : Fine-tuning will take about 3 to 4 hours!

Evaluating the performance of the model on Test data

Now we will be evaluating model’s performance on test data, to know how well our model has been trained. We will call previously defined evaluation function.

Following were the results I achieved after testing the model:

Test Accuracy : 95.6%
Test Loss : 0.274

Writing a function to classify raw text using the fine-tuned model

Here, we will write a function to classify the raw text, and perform the following operations:

Encodes the text using encode_plus().
Pads the input_ids and attention_mask.
Passes the input_ids and attention_mask as parameters to the model.
Passes the outputs from the model through softmax function.

Conclusion

In this post, XLNet model is fine-tuned for binary classification on IMDB dataset, but you can implement the same code for some other datasets as well. Even you can build multi-class classifier using XLNetForSequenceClassification model.

XLNet is an incredibly powerful language understanding model that shows great promise in wide variety of NLP applications. Huggingface transformers library has made it possible to use this powerful model at ease.

Here, I’ve tried to give you a basic intuition on how you might use XLNet for binary classification. Hope you liked reading this article. Your suggestions and feedback are most welcome!

Code for this article is available in my github repo.