Natural Language Generation using BERT

Text Generation using BERT Model

Prakhar Mishra
Intel Student Ambassadors
7 min readApr 10, 2020


Natural Language Generation ( NLG) is one of the active research areas in both academia and industry. It is one of the major subgroups along with NLU ( Natural Language Understanding) under the bigger umbrella of Natural Language Processing ( NLP). NLG is the task of simply turning data into the Natural Language (basically, how people talk and write), this need not necessarily be English, essentially it could be any language that is used by humans.

In the past, there have been various approaches that have been devised for this task. The simplest, yet effective technique to mimic the generation process is to define a certain number of pre-defined domain-specific templates with empty slots. What to fill on those slots could be seen as the task of the NLU system. This solution often comes handy and should not be overlooked (because of its simplicity) when trying to build NLG systems without any training data at all.

Another technique that is quite popular and is the base for recent models is Language Modelling. Language Models are essentially the models that try to model the natural language (the way it’s written, words, grammar, syntax, etc). Once you train a model to learn these intrinsic features of any language, then that same model can be used to generate language having given some input pre-text. I will not be going in detail to how do we train such models, please refer to Neural Language Models for more details. The other approach that is also worth trying is called the Sequence to Sequence modeling. Here, the input and output are both texts. Certain use cases such as Langauge Translation, Document Summarization generally have input text and we expect some output text for that input. Recently a model called T5 (Text-to-Text Transfer Transformer) was seen to outperform current results on various NLP tasks and setting new SOTA. This model is also a Seq2Seq model at heart with transformer units. Despite being of great success modeling techniques such as LM and Seq2Seq both require really large amounts of data for training purposes and computational resources, unlike template-based generation schemes.

In today’s blog, we will try to answer a few interesting questions —
1. What is BERT?
2. Can BERT be used to generate Natural…