Facebook AI Releases XLM/mBERT PyTorch Models in 100 Languages!!!

XLM achieved state-of-the-art results on cross-lingual classification and machine translation.

elvis
DAIR.AI
2 min readAug 20, 2019

--

source: https://arxiv.org/pdf/1901.07291.pdf

Earlier this year, Facebook AI published a paper on a new method called XLM, which basically provides a technique to efficiently pretrain cross-lingual language models based on the popular technique of Transformers. With this model, they achieved state-of-the-art results on cross-lingual classification and machine translation (both supervised and unsupervised). You can see a preview below of how they compare against BERT (Devlin et al., 2018) on both machine translation and cross-lingual classification tasks.

A few hours ago, members of the Facebook AI team released their code for the XLM pretrained model which covers over 100 languages. All code is built on top of PyTorch and they even include an ipython notebook to play around with the model and representations.

These are the language the XLM model supports: en-es-fr-de-zh-ru-pt-it-ar-ja-id-tr-nl-pl-simple-fa-vi-sv-ko-he-ro-no-hi-uk-cs-fi-hu-th-da-ca-el-bg-sr-ms-bn-hr-sl-zh_yue-az-sk-eo-ta-sh-lt-et-ml-la-bs-sq-arz-af-ka-mr-eu-tl-ang-gl-nn-ur-kk-be-hy-te-lv-mk-zh_classical-als-is-wuu-my-sco-mn-ceb-ast-cy-kn-br-an-gu-bar-uz-lb-ne-si-war-jv-ga-zh_min_nan-oc-ku-sw-nds-ckb-ia-yi-fy-scn-gan-tt-am.

This a really important release as it means you can now use the pretrained models or train your own to perform machine translation and cross-lingual classification using the above languages. This is also an important step towards addressing the low-resource problems present in many languages.

--

--