Using XTREME For Evaluating Cross-lingual Generalization

“I love this dataset and can't wait until I start exploring it.”
This is one of the most common dialogues heard from the data science community when they see new data. I don't really write blogs on data, but as the name suggests this is Xtreme ;).
“XTREME is a zero-shot cross-lingual transfer from English.”
Wohhh…that was too much to take it in a single go. Let's break it down and understand what it is actually?
Zero-Shot Learning
Let's start with knowledge of what is zero-shot learning. In simple words whenever we train a model without giving it the desired input is called zero-shot learning. Example:- Task of recognizing an object but not passing the relevant images for it. This is called Zero-shot learning.
Xtreme is a multi-task benchmark for evaluating cross-lingual generalization. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data. Among these are many under-studied languages, such as the Dravidian languages Tamil (spoken in southern India, Sri Lanka, and Singapore), Telugu and Malayalam (spoken mainly in southern India), and the Niger-Congo languages Swahili and Yoruba, spoken in Africa.
Understanding Xtreme
So by now, we have got an idea about the Xtreme dataset. With the recent progress in applications of machine learning models NLP that have been driven by benchmarks that have only been evaluated on the English language.
Let's help it better with the help of an example, Suppose we are building a NER and we are using the BTC dataset. But how do we test our model? Which dataset to use for testing?
Xtreme is the solution to it.
Also going back to the days when the dataset was given very less importance. Or in case you have an ample amount of money to build a model from scratch. Creating a model from scratch is never a good option. Hugging face models make our work a lot cheaper and time-efficient.
Despite the increasing interest over the multilingual model in recent years, there is no such model that enables the evaluation of a model on a variety of languages. Xtreme solves this issue. Xtreme is capable of evaluation of cross-lingual generalization over 40 languages and 9 tasks.
Most important of all being its model's ability to transfer its learning to other languages i.e zero-shot learning.
Tasks and Languages
The tasks of Xtreme include the work on sentence classification, structured prediction, sentence retrieval, and question answering. The full list of tasks can be seen in the image below.

Getting Started with Xtreme
The very first step would be by cloning the repo into your local system. In order to clone the repo, type this on your terminal
git clone https://github.com/google-research/xtreme.git
Once you clone the directory make sure that you traverse into the cloned repository. In order to get started with Xtreme, the very first thing would be to download the dependencies. To download the dependencies, type the following into your terminal.
bash install_tools.sh
The next step would be to download the data. We start off by creating a directory named “downloads”.
mkdir -p download
Note: We are doing all of these into the root of the project directory.
Next step would be, manually download panx_dataset
(for NER) from here.
Finally, type in the below command into the root project to download the remaining data.
bash scripts/download_data.sh
Step 2 (Building BaseLine System)
We fine-tune models that were pre-trained on multi-lingual data on the labeled data of each Xtreme tasks. Then the fine-tuned model is applied to the test data to get the prediction for that particular task.
A single script under the path scripts/train.sh does the work. We can download the models from the Transformers website.
Note: The currently supported model are:-
bert-base-multilingual-cased
,xlm-mlm-100-1280
andxlm-roberta-large
.
Fine-Tuning part-of-speech tagging
The command for pre-training multilingual model on the English POS tagging data:
bash scripts/train.sh [MODEL] udpos
Fine-Tuning named entity recognition
The command for pre-training multilingual model on the English NER data:
bash scripts/train.sh [MODEL] panx
Fine-Tuning Sentence classification
bash scripts/train.sh [MODEL] xnli
You can get a hold of more of the command from its official repo.
This was indeed a very important benchmark that was needed for cross multi-lingual evaluation. Xtreme follows the concepts of zero-shot learning that help it to understand other languages as well.
Until now we were uncertain about which data to use for evaluation of our NLP model. But looks like NLP seems to solve this issue. To have an in-depth explanation of the data exploration of Xtreme, follow this link.
If you like this blog do clap. This encourages us to write more such awesome content. Follow me on medium to receive a notification whenever I publish a new article.
My Github repo:- Link

Follow me on Linkedin to get in touch.