TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Topic Modelling on NYT articles using Gensim, LDA

A Guide to Topic modelling on NYT articles to understand the trends

7 min readJun 8, 2021

--

Imagine you are given text data and asked to find what the text data is about. A quick eyeball on the data would do the job. Now imagine having to go through huge numbers of text documents to understand what it is about. Tedious, right? Topic modelling comes to the rescue in such cases.

Other interesting applications of topic modelling are indexing research papers that help researchers track research trends and identify papers to read; recommendation systems to match users and news articles through clustering, sentimental analysis of reviews on a product.

In this article, we will understand the nitty-gritty of topic modelling and perform topic modelling on Newyork Times articles from the year 2020 using a python library called, Gensim.

The flow of the article will be as follows:

  1. A Brief Introduction to Topic Modelling
  2. Ingredients to achieve topic modelling
    a. Gensim, a python library to perform various NLP tasks
    b. LDA, one of the most popular topic modelling algorithms
  3. Implementing LDA
    a. Preprocessing the data
    b. Creating of dictionary and corpus
    c. Performing LDA
    d. Visualizing the results

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Ramya Vidiyala
Ramya Vidiyala

Written by Ramya Vidiyala

Interested in computers and machine learning. Likes to write about it | https://www.linkedin.com/in/ramya-vidiyala/

No responses yet