Topic Modeling… An Introduction

Ali Moayedi Azarpour
2 min readJun 19, 2023

--

In recent years a vast amount of information is generated and stored in the form of document collections such as news, scientific papers,medical reviews, etc. These collections can be in various sizes from a few words, such as tweets, to pages. All of the information in these collections are not out of interest for people. One of the techniques to summarize the information is understanding the topic of each document [1].

Topic Modeling, a branch of Natural Language Processing (NLP), tries to discover the hidden topics in a document and develop a model to explain and summarize the document. Topic modeling is a fast growing area at the interfaces of Text Mining, Artificial Intelligence (AI), and statistics [2]. The main reason for the growth of attention to topic modeling is the fact that more than 85% of the worldwide data is stored as texts. Since each document is made up of a set of words and each word belongs to a topic, in the context of topic modeling, each topic is a probability distribution over words and the model shows the degree (probability) to which each topic represents each document [2]. Assignment of topics to each word provides an efficient way to detect the inferred topics of documents in the corpus and provide a reasonable response time for retrieval of related documents.

References

[1] González-Santos, Carlos & Vega-Rodríguez, Miguel A. & Pérez, Carlos. (2021). Addressing topic modeling with a multi-objective optimization approach based on swarm intelligence. Knowledge-Based Systems. 225. 107113. 10.1016/j.knosys.2021.107113.

[2] Khalifa, Usama & Corne, David & Chantler, M. & Halley, Fraser. (2013). Multi-Objective Topic Modeling. 10.1007/978–3–642–37140–0_8.

--

--