Building my first NLP text classification model using IPUs

7 min readNov 21, 2022

This blog aims to demonstrate step-by-step how to fine-tune a Graphcore Hugging Face (HF) Optimum model for a text classification task on the GLUE General Language Understanding Evaluation benchmark. I will take it further by showing how to train and test the model using your own datasets.

What is Natural Language Processing and why does it matter?

Natural Language Processing is a branch of artificial intelligence that has been around since the 1940s. Complex techniques such as machine learning and deep learning allow computers to understand, interpret and manipulate human language.

When we refer to NLP, we are concerned with a narrow subset of tasks. These tasks usually apply to designing a computer which uses data such as spoken or written words that are pre-processed by statistical and mathematical methods. The overarching goal of NLP is to bridge the gap between human communication and computer understanding.

There is a myriad of different NLP tasks, and you can read more about them here; however, we will be focusing on Textual Entailment for this example. As the name suggests, Textual Entailment takes a pair of sentences and decides whether the first sentence entails or contradicts the second.

For example, let us say we have two sentences :

Sentence 1: Children smiling and waving at the camera.
Sentence 2: The kids are frowning.

The above two sentences are a clear example of contradiction, i.e the first sentence does not follow the second.

In this blog, we will work on an NLP model that takes sentences and detects whether they are neutral, contradictory or entailing.

NLP tasks on Intelligence Processing Units (IPUs)

Due to the IPUs’ Multiple Instruction, Multiple Data architecture, each core can be separately utilised to do specific tasks. Each IPU core also has its own memory in comparison to a regular Graphics Processing Unit (GPU), which is shared. This individual memory gives the IPU an exceptional bandwidth and super low latency; as a result, IPUs are much faster than GPUs for machine learning tasks such as NLP.

We can also parallelise training using data pipelining and model parallelism thanks to the IPU. For model pipelining, we split a large model across multiple IPU instances. With Data parallelism (model replication) we use the same model for each IPU instance and increase the batch size by running each replica in parallel.

This parallelism allows the batch size processed per instance of data parallelism to significantly increase, making accessing memory much more efficient, and reducing communication time for data learning.

At present, there are multiple ready-to use datasets provided on the Hugging Face Hub. By allowing users to run any public dataset out-of-the box, Optimum streamlines the overall development lifecycle of AI models.

BERT Transformers

In this walkthrough I will be using the BERT transformer NLP model, which can easily run on Graphcore IPUs as part of the growing Hugging Face Optimum Graphcore library. The BERT Base uncased model has been pre-trained on the English language using masked language modelling (MLM).

This particular “uncased” version does not distinguish case. “English” and “english” would be treated the same.

The GLUE benchmark

The Hugging Face Optimum model can run sequence classification on the GLUE benchmark. GLUE is a collection of 9 different text-classification tasks, and you can feel free to explore them on the IPU here. For this example, however, we will use a large dataset for MNLI (Multi-Genre Natural Language Inference) classification (further details on downloading this dataset will be shown later on).

Setting up the environment

For this tutorial we will be using the Paperspace cloud platform, which provides free access to IPU-powered notebooks. You want to start off by making sure that you are signed up to Paperspace.

Here is a super helpful blog post which gets you up and running with IPUs on Paperspace. Once you’ve signed up, you’ll need to create your first notebook (exciting!).

For this case, we will start by selecting the Hugging Face on IPU runtime.

After you have selected your runtime, you need to expand advanced options and replace the workspace URL with:

 https://github.com/siman-ach/Text-Classification.git

example of replacing the docker container to custom URL

This ensures that the example source code files to run the model are cloned into your Paperspace notebook. You need not worry about environment and data configuration setups, as everything comes conveniently pre-installed and ready to go with this Paperspace runtime.

Getting the datasets

We will be using our own custom datasets for this task. The datasets I have used for this example is a subset of the Stanford Natural Language Inference (SNLI) corpus. It is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral. I have taken these datasets and modified them to make them fit for use with the Hugging Face Optimum model.

You may also use your own CSV/ JSON files for training and testing on the Optimum model, and I will show you the necessary steps to do this.

To start with, I took my test, training, and validation CSV files and uploaded them onto the Hugging Face Hub. This is because these files are pretty large and would be challenging to work with otherwise . Hugging Face provides a helpful and easy guide which shows how to upload custom datasets.

Downloading the dataset from the hub is quite straightforward. Open a new terminal in your notebook by clicking the “Terminal” tab on Paperspace and copy this line of code:

Simply input the above command into your notebook terminal for it to download the .CSV files.

If download is successful you will find that the test.csv, train.csv and validation.csv have appeared in a new folder under textclassificationMNLI.

Preparing the model

Firstly, we need to install Optimum and the requirements.txt file in order to run the script, so we will quickly do so now. This step is necessary because the script will fail to run otherwise

To train this model using our local datasets, which we have now downloaded, we need to redefine some arguments.

Within the main script run_glue.py, we will change one of the arguments of the DataTrainingArguments class. The DataTrainingArguments class contains the arguments pertaining to what data we will be inputting into our model for training and evaluation. We could also specify these arguments on the command line in our terminal.

Let’s change max_predict_samples from its default value of None to the number of prediction samples provided by our custom dataset. This will make training quicker and, if required, debugging easier.

For this example, I have changed the value to 100.

You should note that are also multiple other arguments within the DataTrainingArguments class which can be defined, e.g. max_train_samples, and max_eval_samples according to your dataset.

max_train_samples truncates the number of training examples and max_eval_samples does the same but with the number of evaluation samples.

We need to define a trainer using the IPUTrainer class to train a model on the IPU. The trainer class is similar to the original Hugging Face Transformer Trainer and works alongside the IPUConfig object, which specifies the configuration parameters for a model to be executable on the IPU.

In order to be able to use this model on the IPU, we need to load the IPU configuration, IPUConfig, which allows us to specify attributes and control configuration parameters specific to the IPU.

Running the model

To make the training more efficient, we will load the last checkpoint if it exists.

We are now ready to run the model.

Many hyper-parameters have to be tuned to achieve a robust model that will be able to accurately classify sentences with our dataset. The run_glue.py file contains multiple parameters which may be modified to suit different experiments.

Running the script by using python run_glue.py will show you the entire list of available parameters.

The command contains the key parameters to run our dataset.

the train_file, test_file, validation_file parameters specify the location directories of our dataset files; you can change these accordingly.
pod_type may be pod4, pod8, pod16, pod32 or pod64. For this example, using pod8 is sufficient.
you can also specify the number of epochs in num_train_epochs. This is a hyper-parameter that defines the number of times that the algorithm will work through the entire training dataset.
it’s also possible to change the training batch size using per_device_train_batch_size. In practical terms, to determine the optimum batch size, it is recommended to try smaller batch sizes first(usually 32 or 64). It should be noted that using a larger batch size requires more memory.
output_dir is the location of the directory of the output result files.

Evaluating our model

Now that the model has been trained, we can evaluate the effectiveness of the model by seeing how well it predicts the labels of unseen data using our validation dataset.

The metrics below showcase how well the model performs after 3 epochs.

There are multiple ways in which we could try to improve the accuracy of this model, some of which can include changing the learning rate or loss scaling.

Conclusion

In this blog post, I have shown how to run a Hugging Face model for NLP text classification on the IPU in Paperspace using a local dataset. To do this, we:

uploaded the dataset onto the Hugging Face dataset Hub and downloaded it to our notebook;
defined the IPUTrainer class to allow the model to run on the IPU;
ran and compiled the model; and
evaluated the model.

To improve the training of the model, device parameters were loaded and customised in the IPUConfig object and hyper-parameters were tuned in the IPUTrainer class.

More Resources for Hugging Face Optimum on IPUs