Text Classification with XLNet in Action

Bill Huang
4 min readAug 5, 2019

--

Hello friends, this is the second post of my serial “NLP in Action”, in this serial posts, I will share how to do NLP tasks with some SOTA technique with “code-first” idea — — which is inspired by fast.ai.

And I am also looking forwards for your feedbacks and suggestion.
My serial “NLP in Action” contains:

About Text Classification

Text Classification is a usual NLP task, the purpose of text classification is straight forward, is to classify the texts (documents , sentences ,queries or article titles…) , then set a label or labels to the texts.

So that, we can get a deeper understand of the texts, it is good for us to manage the texts as resource. Text classification is widely used, like news recommendation system, after you know the preference of the user, if you had labeled the news before, then you can push a reasonable news to the user, which is a win win situation — — the user enjoy your recommendation, and your website will be enjoyed ^-^

Here is an example:

E.g.

News title: “NBA schedule release 2019–20: Here are four things we already know about this season’s slate of games”
Classification: Sports

News title:”Deep Learning, AI Improve Accuracy of Breast Cancer Detection”
Classification: Technology

About XLNet

XLNet is a language model comes from this paper XLNet: Generalized Autoregressive Pretraining for Language Understanding.

XLNet is a BERT like pre-trained model. We can treat XLNet as an enhanced version of BERT, it outperformed BERT on some NLP tasks including Text Classification , Question Answering and others.

XLNet use Transformer XL as a feature extracting architecture, which is better than BERT’s Transformer, since Transformer XL added recurrence to the Transformer. Which can make the XLNet has a deeper understand of the language context.

Using XLNet for a specific task is very straightforward, we can download XLNet model pre-trained first, then use fine-tuning method to update the pre-trained model to fit downstream task needed, XLNet is a specific transform learning method for NLP.

In this post, I will show how to use XLNet method to do text classification

In Action

As a saying goes “No water, no swimming, no sailing, no boating.”, it would better to get your hand on code, so that we can get a more clear understanding of doing text classification.

Here, I will use the excellent library transformers which deploy by huggingface, this library contains some state-of-the-art pre-trained models for Natural Language Processing (NLP) like XLNet, GPT, BERT … etc.

The process of doing text classification with XLNet contains 4 steps:
1. Load data
2. Set data into training embeddings
3. Train model
4. Evaluate model performance

All the code will show with jupyter notebook here.

And I will give brief introduction of each step.

1.Load data
In order to do text classification, we need dataset which texts were labeled.
First, we need to load the dataset with pandas:

Then we have a look at the data and analyze the tags distribution:

2.Set data into training embeddings
After we get the data, we need to set the text into 3 kinds of embeddings:
- Token embedding
- Mask word embedding
- Segmentation embedding

The process of making embeddings for XLNet is different from BERT, first we will tokenize the texts with sentencepiece, then, we will add “<sep>’”,”<cls>”and pad mask to the embeddings.

3.Train model
When using transform learning like XLNet, the process of training a new model with downstream data called “fine-tuning”, all we need to do it to choose one of the XLNet pre-trained model and use our own data to update the model’s parameter to fix our downstream NLP task.
For English language, XLNet have 2 kinds of model, base cased model and large cased model.
The large version model is a bigger model than base model, which has better performance, but need more computing power and times.
In this example we will choose XLNet base cased model for fine-tuning new model, if you are interested, you can try the large model instead, and it may get a better results.

4.Evaluate model performance
After training a new model for text classification, we want to know how well the model will be. So that ,we can evaluate the model with new data.
The evaluate data can set in the process when we set training data batch
before, it recommended to use 30% of data to act as testing data for performance validation.
After evaluated by testing data, the result may look like this:

Validation Accuracy

Summary

Text classification is a task in NLP to label texts, with the labels, we can know much better about the meaning of the text. In order to do text classification, we can use XLNet — a nowadays SOTA pre-trained model to easily fine-tune a model for text classification downstream task.

Reference

In order to write down this post , I have learned and got inspired from these articles, thank you^-^

  1. XLNet — a clever language modeling solution
  2. Paper Dissected: “XLNet: Generalized Autoregressive Pretraining for Language Understanding” Explained

--

--