Natural Language Processing — Topic modelling (including latent Dirichlet allocation-LDA & analysis) and Sentiment Analysis

  1. Stemming is the process of reducing inflection in words to their root forms such as mapping a group of words to the same stem even if the stem itself is not a valid word in the Language.
  2. Lemmatization reduces the inflected words properly ensuring that the root word belongs to the language. In Lemmatization root word is called Lemma. A lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words.
Input data
Output data

Topic Modelling

1. Latent Dirichlet Allocation topic modelling

In natural language processing, latent Dirichlet allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar.

Corpora
20 topic words
Scoring random example against the topics
Interactive LDA result plot

2. spaCy topic modelling

Topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Topic modeling is a frequently used text-mining tool for discovery of hidden semantic structures in a text body.

3. Sentiment Analysis

I am using 2 libraries for sentiment analysis: Vader and TextBlob.

1. TextBlob

2. Vader

Vader comes from nltk and is another good tool for sentiment analysis.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vivek Sasikumar

Vivek Sasikumar

40 Followers

AI scientist with a passion for data analytics and martial arts!