Topic Modelling for Amazon Reviews
INTRODUCTION:
Going through a write-up and figuring out its subject is a relatively easy task for the human brain; but not for machines. For machines, it would require a few cleaning up processes and algorithms. In this case we make use of an algorithm called the Latent Dirichlet Allocation, usually shortened to LDA.
WHAT IS LDA?
The Latent Dirichlet Allocation algorithm (LDA) is an extension of the Probabilistic Latent Semantic Analysis (PLSA) developed in 1999 by Thomas Hoffman with very little difference on how both algorithms handle the per document distribution. In this article we cover the implementation of the LDA algorithm and we don’t go in-depth but if you need an article that covers this in-depth, please take a look at Thomas’s article; he explains the concept in a very fluent and easy to understand manner.
In order to have a better understanding of LDA, it’s important I give a quick meaning of each of the words used in the name:
Latent: The Oxford dictionary defines latent as something (state, quality of object) that’s not apparent yet; one that has not yet been discovered. In our case here, we are referring to the topics which we are yet to find with the algorithm.