GluonNLP v0.7.1 — BERT Reloaded

Faramarz A. Munshi
Apache MXNet
Published in
4 min readJul 22, 2019


GluonNLP [7] has just been upgraded for the better. The 0.7 release features a BERT Base model pre-trained on a large scale corpus, whose performance is comparable with the BERT Large model from the original BERT paper. Other highlights include more versions of BERT trained on specialized corpora, new models (ERNIE, GPT-2, ESIM etc.), and more datasets. The full release notes can be found here:

After the BERT Big Bang (or, more plainly, the introduction of BERT into NLP), the community has been pervaded with different variations of BERT for specific datasets and use-cases. Each one of these variations proved extremely useful for their specific applications, from SciBERT, a model which significantly improved BERT for use with scientific publications and corpora, to BioBERT [5] which significantly improves the original model for use with biomedical text data. Concurrently, this sudden influx of new models has added fuel to the figurative fire of upgrading and improving the BERT model.

That’s fundamentally our crowning achievement in this release. Our BERT base model has been updated drastically, pre-trained on 3 new corpuses totaling a massive 60 gigabytes of text data: the OpenWebText Corpus, the BooksCorpus, and Wikipedia in English. Because of the larger corpus, the accuracy of this base model (and I repeat *base*), has shot up to surpass even the large model released in the original paper in six out of the seven tasks. Check out the table below for the full results (the bolded number is the best out of the three compared).

Table describing GluonNLP’s BERT performance on various datasets vs. the original paper’s versions

In addition, we’ve also provided easy access to solve your domain adaptation problem, making loading the SciBERT model, the BioBERT model, the ClinicalBERT model [3] as simple as a single line of code, while still providing a whole host of new models: the ERNIE [1] model, the GPT-2 language model [6], and the ESIM model [2], also all accessible with a single line of code.

We’ve introduced more helper scripts for those wishing to fine-tune BERT to their satisfaction, filling in some of the gaps that we previously had with well-commented code: we’ve created a fine-tuning script for NER (named entity recognition) on the CoNLL2003 dataset, a fine-tuning script for the Chinese XNLI dataset, and a fine-tuning script for intent classification and slot labelling on the ATIS and SNIPS datasets. Regardless of your downstream task, we provide you with a basic set of instructions and the right tool for the job, all of which are fully customizable for your NLP needs. But the good news doesn’t stop there.

In addition to the models and scripts, we have released the datasets that pair with these models so you can rigorously test, evaluate, and adapt them to your will. We’ve included in the latest release the CoLA, SST-2, MRPC, STS-B, MNLI, QQP, QNLI, WNLI, RTE datasets for NLU, CR and MPQA, two sentiment analysis datasets, as well as ATIS and SNIPS for testing intent classification and slot labeling.

So what are you waiting for? Get out there and leverage these tools; happy NLP-ing!

What are you waiting for, huh? (image source here)

Getting Started with GluonNLP

To get started with BERT using GluonNLP, visit our tutorial that walks through the code for fine-tuning BERT for multiple tasks. You can also check out our BERT model zoo for BERT pre-training scripts, and fine-tuning scripts for several datasets.

For other new features added in GluonNLP, please read our the release notes here. We are working on many new features, and are even more excited for our next release.


Authors: Faramarz Munshi, Haibin Lin
Editor: Thomas Delteil



We are thankful for the great contributions from the GluonNLP community: @davisliang @paperplanet @ThomasDelteil @Deseaus @MarisaKirisame @Ishitori @TaoLv @basicv8vc @rongruosong @crcrpar@mrchypark @xwind-h @faramarzmunshi @leezu @szha @imgarylai @xiaotinghe @hankcs @sxjscience @hetong007 @bikestra @haven-jeon @cgraywang @astonzhang @LindenLiu @junrushao1994


[1] Sun, Yu, et al. “ERNIE: Enhanced Representation through Knowledge Integration,” 2019; [ arXiv:1904.09223]
[2] Qian Chen, Xiaodan Zhu, Zhenhua Ling, Si Wei, Hui Jiang: “Enhanced LSTM for Natural Language Inference”, 2016; [ arXiv:1609.06038].
[3] Kexin Huang, Jaan Altosaar: “ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission”, 2019; [ arXiv:1904.05342].
[4] Iz Beltagy, Arman Cohan: “SciBERT: Pretrained Contextualized Embeddings for Scientific Text”, 2019; [ arXiv:1903.10676].
[5] Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So: “BioBERT: a pre-trained biomedical language representation model for biomedical text mining”, 2019; [ arXiv:1901.08746].
[6] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever: “Language Models are Unsupervised Multitask Learners”,” 2019; [].
[7] Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang: “GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing”, 2019; [ arXiv:1907.04433].