Member-only story
Hands-On Topic Modeling with Python
A tutorial on topic modeling using Latent Dirichlet Allocation (LDA) and visualization with pyLDAvis
Topic modeling is a popular technique in Natural Language Processing (NLP) and text mining to extract topics of a given text. Utilizing topic modeling we can scan large volumes of unstructured text to detect keywords, topics, and themes.
Topic modeling is an unsupervised machine learning technique and does not need labeled data for model training. It should not be confused with topic classification which is a supervised machine learning technique and needs labeled data for training to fit and learn. In some cases, topic modeling can be used together with topic classification, where we perform topic modeling first to detect topics in a given text and label each record with its corresponding topic. Then this labeled data is used for training a classifier and performing topic classification on unseen data.
In this article, we will focus on topic modeling and cover how to prepare data with text preprocessing, assign the best number of topics with coherence score, extract topics using Latent Dirichlet Allocation (LDA), and visualize topics using pyLDAvis.
While following the article, I encourage you to check out the Jupyter Notebook on my GitHub…