
All too often, we treat topic models as black-box algorithms that “just work.” Fortunately, unlike many neural nets, topic models are actually quite interpretable and much more straightforward to diagnose, tune, and evaluate. Hopefully this blog post has been able to explain the underlying math, motivations, and intuition y…
LDA typically works better than pLSA because it can generalize to new documents easily. In pLSA, the document probability is a fixed point in the dataset. If we haven’t seen a document, we don’t have that data point. In LDA, the dataset serves as training data for the dirichlet distribution of document-topic distributions. If we haven’t seen a document, we can easily sample from the dirichlet distribution and move forward from there.