Member-only story
Topic Modeling with LSA, pLSA, LDA, NMF, BERTopic, Top2Vec: a Comparison
A comparison between different topic modeling strategies including practical Python examples
Table of contents
- Introduction
- Topic Modeling Strategies
2.1 Introduction
2.2 Latent Semantic Analysis (LSA)
2.3 Probabilistic Latent Semantic Analysis (pLSA)
2.4 Latent Dirichlet Allocation (LDA)
2.5 Non-negative Matrix Factorization (NMF)
2.6 BERTopic and Top2Vec - Comparison
- Additional remarks
4.1 A topic is not (necessarily) what we think it is
4.2 Topics are not easy to evaluate - Conclusions
- References
1. Introduction
In Natural Language Processing (NLP), the term topic modeling encompasses a series of statistical and Deep Learning techniques to find hidden semantic structures in sets of documents.
Topic modeling is an unsupervised Machine Learning problem. Unsupervised means that the algorithm learns patterns in absence of tags or labels.
Most of the information we generate and exchange as human beings has a textual nature. Documents, conversations, phone calls, messages, emails, notes, social…