Member-only story
Topic Modelling on NYT articles using Gensim, LDA
A Guide to Topic modelling on NYT articles to understand the trends
Imagine you are given text data and asked to find what the text data is about. A quick eyeball on the data would do the job. Now imagine having to go through huge numbers of text documents to understand what it is about. Tedious, right? Topic modelling comes to the rescue in such cases.
Other interesting applications of topic modelling are indexing research papers that help researchers track research trends and identify papers to read; recommendation systems to match users and news articles through clustering, sentimental analysis of reviews on a product.
In this article, we will understand the nitty-gritty of topic modelling and perform topic modelling on Newyork Times articles from the year 2020 using a python library called, Gensim.
The flow of the article will be as follows:
- A Brief Introduction to Topic Modelling
- Ingredients to achieve topic modelling
a. Gensim, a python library to perform various NLP tasks
b. LDA, one of the most popular topic modelling algorithms - Implementing LDA
a. Preprocessing the data
b. Creating of dictionary and corpus
c. Performing LDA
d. Visualizing the results