Member-only story
Thoughts and Theory
Quantitative evaluation of a pre-trained BERT model
A prerequisite to use a pre-trained model as is, without fine tuning
TL;DR
Self-supervised learning is being leveraged off at scale using transformers, not only for text, but lately also for images(CLIP, ALIGN), to solve traditionally supervised tasks (e.g. classification), either as is, or with subsequent fine tuning. While most, if not all, downstream NLP tasks are performed, to date, with subsequent fine-tuning of a pre-trained transformer model, it is possible to use a pre-trained model as is, without subsequent fine tuning.
For instance, the utility of a pre-trained BERT model as is, without any fine tuning for a wide variety of NLP tasks is largely overlooked. Examples of direct use of a pre-trained BERT model without fine tuning are
- Unsupervised NER. NER, a traditionally supervised task can be done without having to tag individual terms in a sentence. Instead a one time labeling of BERT vocabulary vector clusters for entity types of interest would suffice, where the vocabulary vectors are obtained directly from a pre-trained model.
- Unsupervised sentence representations. A well trained BERT model on next sentence prediction can be used to…