A hands-on tutorial with codes

You can learn how to build pretraining models for NLP Classification Tasks

Photo by Annie Spratt on Unsplash

In this article, a hands-on tutorial is provided to build RoBERTa (a robustly optimised BERT pre-trained approach) for NLP classification tasks.

The code is uploaded on Github [Click Here].

The problem of using latest/state-of-the-art models is the APIs are not easy to use and there are few documentation and tutorials (unlike use XGBoost or LightGBM).

Here, I try to simplify the steps to build more and have comments as more as possible. …

Classification models are widely used in varies scenarios. In this article, not only accuracy or f1 score will be discussed, but also KS and Kappa score are mentioned. 7 different evaluation methods are as follows. Let’s dive deep into them!

  1. Accuracy
  2. Precision
  3. Recall
  4. F1
  5. AUC-ROC
  6. KS
  7. Kappa score


Accuracy is to calculate the percentage of predictions are correct. Generally speaking, it can be used in most of the cases.

However, when targets are super imbalanced, using accuracy is wrong. For instance, in fraud detection, 99.99% of transactions are good and only 0.01% is bad. If we simply assume 100% users…


Data Scientist / Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store