TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Member-only story

Hands-On Topic Modeling with Python

Idil Ismiguzel
TDS Archive
Published in
11 min readDec 14, 2022

--

Photo by Bradley Singleton on Unsplash

Topic modeling is a popular technique in Natural Language Processing (NLP) and text mining to extract topics of a given text. Utilizing topic modeling we can scan large volumes of unstructured text to detect keywords, topics, and themes.

Topic modeling is an unsupervised machine learning technique and does not need labeled data for model training. It should not be confused with topic classification which is a supervised machine learning technique and needs labeled data for training to fit and learn. In some cases, topic modeling can be used together with topic classification, where we perform topic modeling first to detect topics in a given text and label each record with its corresponding topic. Then this labeled data is used for training a classifier and performing topic classification on unseen data.

In this article, we will focus on topic modeling and cover how to prepare data with text preprocessing, assign the best number of topics with coherence score, extract topics using Latent Dirichlet Allocation (LDA), and visualize topics using pyLDAvis.

While following the article, I encourage you to check out the Jupyter Notebook on my GitHub…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Idil Ismiguzel
Idil Ismiguzel

Written by Idil Ismiguzel

Data Scientist | Writing articles on Data Science & Machine Learning | MSc, MBA | https://de.linkedin.com/in/idilismiguzel