Text classification with transformers in Tensorflow 2: BERT
Introduction
The transformer-based language models have been showing promising progress on a number of different natural language processing (NLP) benchmarks. The combination of transfer learning methods with large-scale transformer language models is becoming a standard in modern NLP. In this article, we will make the necessary theoretical introduction to transformer architecture and text classification problem. Then we will demonstrate the fine-tuning process of the pre-trained BERT model for text classification in TensorFlow 2 with Keras API.
Text classification — problem formulation
Classification, in general, is a problem of identifying the category of a new observation. We have dataset D, which contains sequences of text in documents as
where Xi can be for example text segment and N is the number of such text segments in D.
The algorithm that implements classification is called a classifier. The text classification tasks can be divided…