“Analyzing YouTube Fan’s Feelings — Uncovering the Power of Sentiment Analysis!”

In this article, we are going to learn what is Sentiment Analysis and how to do it in Python

Published in
7 min readFeb 8



Sentiment analysis is the process of analyzing the sentiment (or feeling) behind text data. It is a type of natural language processing (NLP) that uses natural language to identify, extract, quantify, and understand the sentiment of a text. The goal of sentiment analysis is to determine the attitude, opinions, and emotions of a speaker or writer with respect to some topic or the overall contextual polarity of a document.

Natural language processing (NLP) is a branch of artificial intelligence that deals with understanding and generating human language, which is used in Sentiment Analysis. It focuses on analyzing, understanding, and generating natural language by computers. NLP algorithms are used for language translation, text analysis, speech recognition, and more.

Sentiment analysis can be used to better understand customer sentiment, identify trends and themes in a set of customer reviews, or even predict stock market movements. Sentiment analysis has applications in a variety of industries, including customer service, marketing, and finance.


The process of sentiment analysis involves a number of steps, including data acquisition, data pre-processing, feature extraction, model training, and evaluation.

Data acquisition involves collecting text data from sources such as social media, customer reviews, and surveys. Once the data has been acquired, it can be pre-processed to remove irrelevant information and prepare it for feature extraction.

Feature extraction involves extracting relevant features from the data, such as word frequencies, part-of-speech tags, and sentiment scores. These features can then be used to train a model that can be used to classify text into positive, negative, or neutral sentiments.

Once the model has been trained, it can be evaluated against a test set to measure its accuracy. This evaluation can be used to identify areas of improvement and fine-tune the model for improved performance.


1. Social Media Sentiment Analysis

Sentiment mining from social media listening can give you valuable insights about products, customers, and target demographics. It can be used to analyze customer likes and dislikes about products, brands, and advertising content, as well as gain insights from comments on YouTube videos. Through techniques such as TikTok and Instagram social listening, as well as YouTube video analysis, you can gain a granular market analysis and understand common themes across different social platforms.

2. Brand Experience Insights

Gathering brand experience insights can help you understand the sentiment of customers towards your brand, identify issues, retain customers, build loyalty, and increase sales. These insights can be beneficial to your business.

3. Patient Insights

A sentiment analyzer is a valuable tool for healthcare organizations, allowing them to measure the efficiency of healthcare delivery, discover any gaps in out-patient or in-patient service, improve pharmacies, and gain insights into patient and caregiver needs. In the Middle East, a major healthcare organization analyzes millions of surveys annually to inform healthcare delivery.

4. Improve Customer Service

Sentiment analysis is a useful tool for improving customer service by analyzing customer experience through emotion-mining chatbot histories, customer service call transcripts, customer complaint emails, returns, refunds customer comments, and customer surveys. This can help to create a better customer experience, leading to increased customer satisfaction.

5. News Trend Analysis

Sentiment analysis is used to extract trends from news sources such as websites, videos, articles, and magazines, as well as online platforms like blogs, Twitter, and Facebook. It can be used to predict market behavior on current affairs such as politics, crude-oil trading, and share movements of enterprises.

This data can be used by industries such as banking, insurance, real estate, automotive, cosmetics, etc. to make decisions such as planning supply chains, managing PR, and altering new product launches.

6. Data-Driven Marketing Insights

Marketing strategies that are based on a data-driven analysis of not just qualitative metrics (number of likes, followers, shares, etc. ) but also on quantitative metrics gained from comments analysis (percentage of positive and negative sentiment about various aspects of a business) have a higher chance of being successful.


We will be performing the analysis on this dataset. Just like any other type of data, it is quite crucial to clean the text data as well so we can draw insights that are accurate and make sense.

To kick things off we start with the imports and write a function to clean the data first.

import numpy as np 
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from textblob import TextBlob
import seaborn as sns

#Reading the data
Comments_df = pd.read_csv('/kaggle/input/ken-jee-youtube-data/All_Comments_Final.csv')
#This function takes a string of text and removes special characters, extra spaces, and newline characters.
def clean_text(text):
#Convert all characters to lowercase and strip any leading/trailing whitespace
text = text.lower().strip()
#Remove any non-alphanumeric characters
text = re.sub(r"[-?.!,/\"]", '', text)
text = re.sub(r"[-()\"#/@;:<>{}`+=~|.!?,']", "", text)
#Remove any multiple spaces
text = re.sub(r"[ ]+", " ", text)
#Remove any blank lines
text = re.sub('\n\n','', text)
#Strip any remaining whitespace
text = text.rstrip().strip()
#Return the text
return text

We will then go ahead and use this function and create a new column.

#Loop through each comment in the Comments_df dataframe
for i in range(len(Comments_df['Comments'])):
#Try to clean the text, then append the result to the clean_comments list
#If there is an error, append 'None' to the list
#Add the clean_comments list as a new column to the Comments_df dataframe
Comments_df['Clean Comments'] = clean_comments

Now we will be using TextBlob library to add the sentiment polarity on top of our new column.

Polarity refers to the overall sentiment conveyed by a particular text, phrase, or word. This polarity can be expressed as a numerical rating known as a “sentiment score”. For example, this score can be a number between -100 and 100 with 0 representing neutral sentiment.

#initiating an empty list
polarity = []
#Loop through each comment in the 'Clean Comments' column of the DataFrame 'Comments_df'
for i in Comments_df['Clean Comments']:
#Create a TextBlob object for each comment
blob = TextBlob(i)
#Append the polarity of each comment to the empty list 'polarity'
#Add the 'polarity' list as a new column to the 'Comments_df' DataFrame
Comments_df['polarity'] = polaritypy

Let’s now add the sentiment based on polarity.

#initiating an empty list
sentiment = []
# Loop through the range of rows for the 'polarity' column in Comments_df
for i in range(len(Comments_df['polarity'])):
#Append 'Positive' to the list if the polarity is greater than 0
if Comments_df['polarity'][i] > 0:
#Append 'Negative' to the list if the polarity is greater than 0
elif Comments_df['polarity'][i] < 0:
#Append 'Neutral' to the list if the polarity is greater than 0
# Assign the sentiment list to the 'sentiment' column in Comments_df
Comments_df['sentiment'] = sentiment

Let’s now take visualize the distribution of the sentiment column

# Plotting the Count and Proportional Distribution of reviews based on sentiment as per polarity
plt.subplot(1, 2, 1)
# Plotting the count of reviews for each sentiment
sns.countplot(Comments_df['sentiment'], )
plt.ylabel('Number of Reviews')
plt.title('Distribution of Sentiments based on Polarity', fontsize=18)
plt.subplot(1, 2, 2)
# Plotting the proportional distribution of sentiments
plt.pie(x=[len(Comments_df[Comments_df['polarity'] < 0]), len(Comments_df[Comments_df['polarity'] == 0]),
len(Comments_df[Comments_df['polarity'] > 0])],
labels=['Negative', 'Neutral', 'Positive'], autopct='%1.1f%%', pctdistance=0.5,
textprops={'fontsize':14, 'color':'white'})
plt.title('Proportional Distribution of Sentiments')
Distribution of the various sentiments

When looking at the chart above, we can safely say 69.5% of the comments are positive, 22.7% are negative, and 7.8% are neutral. From the perspective of the YouTuber, this could mean there is a need to examine precise words used to determine why 22.7% of comments are negative.

It may be the result of an issue with the video’s lighting, audio, or video quality, what words were used by the YouTube Creator, or if any statements may have been made that are not acceptable to certain audiences. The YouTuber can address those reasons so that the viewers can help the channel grow.

P.S. This was an unsupervised machine-learning technique. For Supervised Machine Learning, you could train a classification model or even use a Neural Network.


  1. Sentiment analysis is the process of analyzing the sentiment (or feeling) behind text data.
  2. Sentiment analysis involves collecting text data from sources, pre-processing it to remove irrelevant information, extracting relevant features, training a model with those features, and then evaluating the model to see how accurately it can classify text into positive, negative, or neutral sentiments.
  3. Use cases of Sentiment Analysis are Social Media Sentiment Analysis, Brand Experience Insights, Patient Insights, Improve Customer Service, News Trend Analysis and Data-drive Marketing Insights, and many more.
  4. TextBlob library is built on top of NLTK library which makes it easier in case of unlabeled data for performing Sentiment Analysis and generating Insights. For Supervised Machine Learning, one could train a classification model or even use a Neural Network.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website.

If you liked this article, I recommend you go with the Global Certificate in Data Science & AI because this one will cover your foundations, machine learning algorithms, and deep neural networks (basic to advance).



Editor for

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!