Sentiment Analysis and Topic Modeling of Tweets on UKraineRussia War

Ibekwe kingsley
Machine learning Mastery
5 min readAug 27, 2022

Sentiment Analysis and Topic Modeling is one of Natural Language processing techniques that may be utilized in studying or acquiring the underlying ideas and perspectives of individuals about a given subject of matter.

Sentiment analysis (or opinion mining) is a natural language processing (NLP) approach used to detect whether input is positive, negative or neutral. Sentiment analysis is typically performed on textual data to help organizations monitor brand and product sentiment in consumer feedback, and understand customer demands.

A topic model is a form of NLP model for detecting the abstract “themes” that appear in a collection of documents. Topic modeling is a regularly used text-mining approach for detection of latent semantic patterns in a text body.

In this post I will explain a step by step procedure on how I performed Sentiment Analysis and Topic modeling for tweets about the Ukraine Russia war using python.

SENTIMENT ANALYSIS

DATA Collection: For the data collection I made use of bots to grab data from twitter. I scraped for 50000 tweets that contains the term #ukrainerussiawar. Due to the enormous quantity of tweets I grabbed data for only 2 months. Check out my previous post on how to scrape data from twitter without twitter developer keys

DATA PREPROCESSING: For the data preprocessing I wrote a method to remove tags, hyperlinks, emoticons, retweets and mentions from the dataset. I further went on to eliminate rows that contain non English words as my major focus was on the English words.

MODEL BUILDING: For the model building I made use of Text blob. Text blob TextBlob is a Python library for processing textual data. It provides a straightforward API for digging into standard natural language processing (NLP) activities such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. On applying the library on the already preprocessed dataset I got the polarity and Subjectivity of each of the tweets.

Polarity refers to the strength of an opinion. It could be positive or negative. If something has a strong positive sensation or emotion linked with it, such as admiration, trust, love; this will indeed have a certain orientation towards all other elements of that object’s existence. The same goes for negative polarities. A good example would be the following: ‘I don’t think I’ll buy this item because my previous experience with a similar thing wasn’t very good.’ That will have a negative polarity.

The strength of positive and negative polarities might fluctuate depending on the situation, yet they are still considered to be extremely positive or negative. What about weak sentiment? It is typical for people to describe their feelings using phrases like ‘quite’ or ‘slightly’. Sentiment analysis programs might treat them as mildly favorable or negative. What counts most here is what people feel about something else through non-verbal communication. The usage of these terms might reflect how much they regard an object in a subjective way or not; consequently, subjectivity comes into play here as well.

Subjectivity refers to the degree to which a person is personally invested with an object. What matters the most here are personal connections and individual experiences with that thing, which may or may not differ from someone else’s point of view. For example: ‘I’m quite satisfied with my new smartphone because it has the highest performance accessible on the market.’ Similarly to polarity, strong subjectivity may be negative or positive. The statement above is plainly subjective because the user is truly talking about his experience and how he feels about an object.

From the Analysis I was able to find out for the polarity

61% of the tweets were Neutral implying people that were tweeting had neither a favorable or negative feeling about the war.

21% of the tweets had a positive polarity and

18% of the tweets had a neutral polarity.

For the Subjectivity I found out that

58% of the tweets had a neutral subjectivity which means they were neither negative or positive and 42% of the tweets had a positive subjectivity which indicates they were factual information not public opinion.

I also went farther to construct a word cloud to get the top words from the tweets.

TOPIC MODELLING

Going ahead I performed Topic modelling on the data. Topic modeling is an unsupervised machine learning technique that’s capable of analyzing a series of papers, recognizing word and phrase patterns within them, and automatically clustering word groups and comparable expressions that best characterize a batch of documents.

I performed the modeling with the LDA (Latent Dirichlet allocation) model and then viewed the output with PylDavis. The model generated a cluster for 5 separate subjects that exemplified the varied perspectives and feelings towards the war. Some of the subjects included:

  1. Destruction of life and property

2. History — which concentrates on the war’s roots and the rest on NATO.

3. Help — People asked for assistance and countrymen pleaded for evacuation.

4. Inflation — It was apparent that the inflation rate had surged in Russia and Ukraine, harming their citizens.

Sentimental Analysis and Topic modelling can also be applied by brands and companies to understand how customers are interacting or feeling about their product or service.

You can access the codes used to perform the analysis on my github

Follow me on Linkedin

If this post was useful to you please don’t forget to clap and follow the publication for more data science posts.

--

--