Using RoBERTa with for NLP

Implementing the current state of the art NLP model in

Dev Sharma
Sep 2, 2019 · 4 min read

This tutorial will walk you through integrating Fairseq’s RoBERTa model via Hugging Face’s Transformers and libraries. We will be building upon Keita Kurita’s article on Fine-Tuning BERT with Fast AI. Lastly, we will be using the IMDB dataset.

Update 2020.11: has upgraded to v2 since the release of this article. For compatibility with the steps below, v2 remains untested. Therefore, usage of v1 is recommended for following along with this article.

Image for post
Image for post

Fastai provides a streamlined interface to build datasets and train models. However, it doesn’t offer built in functionalities for current state of the art NLP models such as RoBERTa, BERT or XLNet (as of Sep 2019). Integrating these into Fastai can allow you to enjoy the convenience of Fastai methods in combination with the strong predictive power of these pretrained models.

The concept of Transfer Learning is still relatively new to NLP and one that is growing at a very rapid pace. Therefore, it is promising to see a model such as RoBERTa perform incredibly well on the SuperGLUE benchmark across several varying NLP tasks.

RoBERTa vs. other models on SuperGLUE tasks
RoBERTa vs. other models on SuperGLUE tasks
RoBERTa vs. other models on SuperGLUE tasks. source

In essence, RoBERTa builds upon BERT by pretraining longer with more data, bigger batch sizes while only pretraining on masked language modeling as opposed to pretraining on next sentence prediction as well. The underlying architecture remains unchanged as both utilize masked language model pretraining. You can read here for more information on the differences.

0. Prerequisites

You will need to have both the Fastai and transformers libraries installed, preferably with access to a GPU device. For Fastai, you can follow instructions provided here. For Transformers:

pip install transformers

1. Setting Up the Tokenizer

First, let’s import relevant Fastai tools:

from fastai.text import *
from fastai.metrics import *

and Roberta’s Tokenizer from Transformers:

from transformers import RobertaTokenizer
roberta_tok = RobertaTokenizer.from_pretrained("roberta-base")

RoBERTa uses different default special tokens from BERT. For example, instead of [CLS] and [SEP] for starting and ending tokens, <s> and </s> are used respectively. For example, a tokenized movie review may look like:

“the movie was great” → [<s>, the, Ġmovie, Ġwas, Ġgreat, </s>]

We will now create a Fastai wrapper around RobertaTokenizer.

Now, we can initialize our Fastai tokenizer: (Note: we have to wrap our Fastai wrapper within the Tokenizer class for Fastai compatibility)

fastai_tokenizer = Tokenizer(tok_func = FastAiRobertaTokenizer(roberta_tok, max_seq_len=256), pre_rules=[], post_rules=[])

Next, we will load Roberta’s vocabulary.

path = Path()
with open('vocab.json', 'r') as f:
roberta_vocab_dict = json.load(f)

fastai_roberta_vocab = Vocab(list(roberta_vocab_dict.keys()))

2. Setting up the Databunch

Before we can build our Fastai DataBunch, we need to create appropriate pre-processors for the tokenizer and vocabulary.

Now, we will create a DataBunch class specifically for Roberta.

And lastly, we will also need a Roberta specific TextList class:

class RobertaTextList(TextList):
_bunch = RobertaDataBunch
_label_cls = TextList

3. Loading the Data

Whew, now that we have finished the involving set up process, we can bring it all together to read in our IMDB data.

df = pd.read_csv("IMDB Dataset.csv")feat_cols = "review"
label_cols = "sentiment"

We can now simply create create a Fastai DataBunch with:

processor = get_roberta_processor(tokenizer=fastai_tokenizer, vocab=fastai_roberta_vocab)data = RobertaTextList.from_df(df, ".", cols=feat_cols, processor=processor) \
.split_by_rand_pct(seed=2019) \
.label_from_df(cols=label_cols,label_cls=CategoryList) \
.databunch(bs=4, pad_first=False, pad_idx=0)

4. Building a Custom Roberta Model

In this step, we will define the model architecture to pass to our Fastai learner. Essentially, we add a new final layer to the output of the RobertaModel. This layer will be trained specifically for the IMDB sentiment classification.

Initialize the model:

roberta_model = CustomRobertatModel()

5. Train the Model

Initialize our Fastai learner:

learn = Learner(data, roberta_model, metrics=[accuracy])

Start training:

learn.model.roberta.train() # set roberta into train modelearn.fit_one_cycle(1, max_lr=1e-5)

After only a single epoch and without unfreezing layers, we achieve an accuracy of 94% on the validation set.

Image for post
Image for post
.941900 accuracy in a single epoch of training

You can now also utilize other Fastai methods such as:

# find an appropriate lr
# unfreeze layers
# train using half precision
learn = learn.to_fp16()

6. Creating Predictions

Since predictions are not outputted in order by Fastai’s get_preds function, we can use the following method.

def get_preds_as_nparray(ds_type) -> np.ndarray:

preds = learn.get_preds(ds_type)[0].detach().cpu().numpy()
sampler = [i for i in data.dl(ds_type).sampler]
reverse_sampler = np.argsort(sampler)
ordered_preds = preds[reverse_sampler, :]
pred_values = np.argmax(ordered_preds, axis=1)
return ordered_preds, pred_values
# For Valid
preds, pred_values = get_preds_as_nparray(DatasetType.Valid)

Note: if we had a test set, we could have easily added a test set during step 3 earlier by initializing “data” like this:

data = RobertaTextList.from_df(df, ".", cols=feat_cols, processor=processor) \
.split_by_rand_pct(seed=2019) \
.label_from_df(cols=label_cols,label_cls=CategoryList) \
.add_test(RobertaTextList.from_df(test_df, ".", cols=feat_cols, processor=processor)) \
.databunch(bs=4, pad_first=False, pad_idx=0)

Hence, if we had a test set, we could derive preds via:

test_preds = get_preds_as_nparray(DatasetType.Test)

Now, you have the capability to train on almost any text based dataset using RoBERTa with Fastai, combining two very powerful tools to produce effective results. You can access this tutorial’s jupyter notebook along with the data on my github page or kaggle kernel. (if you have trouble viewing the nb on github, use this link). If you are interested in seeing a similar implementation for SuperGLUE tasks, read on to my following work on Using RoBERTa with Fastai for SuperGLUE Task CB.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Dev Sharma

Written by

MSc Analytics @ Columbia

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Dev Sharma

Written by

MSc Analytics @ Columbia

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store