Introduction
In this article, I will explain what is Zipf’s Law in the context of Natural Language Processing (NLP) and how knowledge of this distribution has been used to build better neural language models. I assume the reader is familiar with the concept of neural language models.
Code
The code to reproduce the numbers and figures presented in this article can downloaded from this repository.