Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

Thoughts and Theory

Quantitative evaluation of a pre-trained BERT model

A prerequisite to use a pre-trained model as is, without fine tuning

10 min readApr 10, 2021

--

Figure 1. Quantitative evaluation of a pre-trained BERT model. The test quantitatively evaluates a pre-trained model’s (a) context sensitive vectors by the models ability to predict a masked position and (b) the [CLS] vector quality by examining vector quality of the masked phrase. The clustering quality of underlying vocabulary vectors, particularly the separation of entity types into clusters plays in implicit role in this. This test is done by using a test data set of triples (sentence with a masked phrase, masked phrase in sentence, entity type of masked phrase in the context of the sentence). Performance of the model on a sentence is determined by the entity type of the predictions for a masked position and the [CLS] vector for the masked phrase. The entity type of predictions for a masked position or [CLS] vector is determined by the clusters of context independent vectors — whose quality is determined qualitatively by the nature of clusters (how separated the entity types are). The quantitative test yields a confusion matrix and F1-scores for each entity type. Image created by Author.

TL;DR

Self-supervised learning is being leveraged off at scale using transformers, not only for text, but lately also for images(CLIP, ALIGN), to solve traditionally supervised tasks (e.g. classification), either as is, or with subsequent fine tuning. While most, if not all, downstream NLP tasks are performed, to date, with subsequent fine-tuning of a pre-trained transformer model, it is possible to use a pre-trained model as is, without subsequent fine tuning.

For instance, the utility of a pre-trained BERT model as is, without any fine tuning for a wide variety of NLP tasks is largely overlooked. Examples of direct use of a pre-trained BERT model without fine tuning are

  • Unsupervised NER. NER, a traditionally supervised task can be done without having to tag individual terms in a sentence. Instead a one time labeling of BERT vocabulary vector clusters for entity types of interest would suffice, where the vocabulary vectors are obtained directly from a pre-trained model.
  • Unsupervised sentence representations. A well trained BERT model on next sentence prediction can be used to…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Ajit Rajasekharan
Ajit Rajasekharan

No responses yet