Language Model Fine-Tuning For Pre-Trained Transformers

Supercharging pre-trained Transformer models to deal with datasets containing domain-specific language.

Thilina Rajapakse

Published in

Skil-AI

4 min readApr 3, 2020

Transfer Learning

Broadly speaking, Transfer Learning is the idea of taking the knowledge gained from performing some task and applying it towards performing a different (but related) task. Transformer models, currently the undisputed masters of Natural Language Processing (NLP), rely on this technique to achieve their lofty state-of-the-art benchmarks.

Transformer models are first trained on huge (and I mean huge) amounts of text in a step called “pre-training”. During this step, the models are expected to learn the words, grammar, structure, and other linguistic features of the language. The text is represented by tokens each of which has its own unique id. The collection of all such tokens is referred to as the vocabulary of the model. All the actual words in the text are iteratively split into pieces until the entire text consists only of tokens which are present in the vocabulary. This is known as tokenization. The idea behind tokenization is to convert the actual text into a numerical representation which can be used with neural network (NN) models.

Language Model Fine-Tuning For Pre-Trained Transformers

Supercharging pre-trained Transformer models to deal with datasets containing domain-specific language.

Transfer Learning

Written by Thilina Rajapakse