BETO: Spanish BERT

elvis
DAIR.AI
Published in
2 min readDec 27, 2019
BETO GitHub

Article translated into Spanish.

Transformer based models are creating tremendous impact in the space of NLP as they have proven to be effective in a wide range of tasks such as POS tagging, machine translation, named-entity recognition, and a series of text classification tasks.

This year saw the introduction to a whole family of transformer-based language models such as BERT, Transformer-XL, and GPT-2, among others.

Langauge models, in general, offer desirable properties that can be leveraged in a transfer learning setting where you train a model with large-scale data to learn the properties of language in an unsupervised setting. The resulting model and weights can then be fine-tuned and be applied in low-resourced regimes to address different NLP tasks.

In particular, it’s exciting to see the use of BERT in different domains such as text classification, text summarization, text generation, and information retrieval. However, where language models and transfer learning could be heavily leveraged is in settings or research where datasets are limited. That happens to be the case for the Spanish language and many other languages.

BETO is an initiative to allow the use of BERT pre-trained models for Spanish NLP tasks. The corresponding authors recently published the library with a few results. They compare their model with recent multilingual efforts and the results are summarized in the figure below:

As you can observe, these preliminary results are very impressive. BETO-cased or BETO-uncased outperform other multilingual efforts on a variety of tasks.

It would be great to see more of these efforts for other low-resourced languages as well. If you are working on this type of effort, please feel free to reach out and we can work on a feature post for such libraries.

If you want to train your own transform-based language models, the good folks at Hugging Face have developed this impressive library that makes it easy to use transformer-based pre-trained models.

--

--