Source: https://www.kdnuggets.com/2020/06/5-essential-papers-sentiment-analysis.html

Sentiment Analysis of a Youtube Video

This is the Part 1 of the series of implementation of Sentiment Analysis.

Amit Ranjan
Published in
5 min readDec 12, 2020

--

QUICK RECAP

In my last article on Natural Language Processing, I mentioned about Sentiment Analysis as some of the frequently used NLP techniques. Let’s deep dive into it.

Sentiment Analysis
Sentiment analysis (or opinion mining) is a natural language processing technique used to interpret and classify emotions in subjective data. Sentiment analysis is often performed on textual data to detect sentiment in emails, survey responses, social media data, and beyond.

Now we know what is Sentiment Analysis let’s see how it can be performed on Social Media Data ( in our case Youtube comments ).

IMPLEMENTATION OF SENTIMENT ANALYSIS

We will implement sentiment analysis on the below youtube video part by part. We are taking the video on “The uprising of India’s farmers: What’s behind the protests?” published by Global News.

This video is about “Farmers in India have been rallying for months against three agricultural laws enacted Sept. 20 by Prime Minister Narendra Modi’s government. The Indian government has argued the changes will give farmers more freedom, but farmers are concerned the new laws will drive down their products’ prices with no safeguards to protect them against corporate takeovers and exploitation. But the crisis in India’s agricultural sector is nothing new as the industry has been suffering for decades.”

Source: https://www.youtube.com/channel/UChLtXXpo4Ge1ReTEboVvTDg

Part 1: Getting the data as a part of Data Cleaning

In this article we will gather all the real time comments from the above youtube video on the current ongoing issue in India.

For this we have to scrap the website to have meaningful data with us. We can do it through multiple ways. I have used two methods to gather the data through :

  1. Selenium
  2. Google API

Selenium

Step 1: Install Necessary Libraries

For selenium we need to install it first in Google Colab. Below are the codes for installation.

It would take some time to install based on your internet speed.

Step 2: Import Necessary Libraries

After installing selenium we need to import all the libraries for scrapping the websites.

Step 3: Setup ChromeDriver and Scrap the Comments

We have taken an authors list to store all the comments made by youtube users.

Then, we configure the ChromeDriver for scrapping.

If you go on comment section of youtube you will find that the youtube comment section dynamically loads when you scroll down. For gathering at least 100 comments I have iterated for loop 6 times. Also, I have taken timer to stop for 5 sec to load the page. If you have slow internet connection make sure to increase the time so that page loads successfully and you don’t ran into exceptions.

Now comes the best part. We start scrapping all the comments and their author details.

We got uncleaned data and we have to clean it as it contains some of the extra details like when did the user commented and how many reply it got at the end.

In first comment, “\n11 hours ago\n” after username (Aspen Gaming) and “\n26\nREPLY” at the end of the comment.

We want our data to be like this,

where we have a user and his comment.

For removal of those extra details we need to clean it and then put it into data frame.

Step 4: Clean and Put data into DataFrame

In line 7, we are removing time of the comments after username like \n11 hours ago\n.

In line 8, we are removing \nREPLY at the end of the comment.

In line 9, we are removing \n followed by number of replies to that comment.

When a user edit their comments, youtube shows the username followed by (edited). This we have to remove from our username. This is what line 10- 12 is doing!

Lastly, we put author name as row name and their comment in column inside a Data frame.

Our final data after scrapping the website through selenium looks like:

Final DataFrame after scrapping

What are the problems we found here?
We had to do some cleansing before storing it into Data Frame.
These might not work if youtube changes its webpage design.

This is why I switched to Google API.

GOOGLE API

Step 1: Create a Google developer account and your API KEY for youtube data.

Don’t worry it’s free!! Follow the great video tutorial on “Scrape YouTube Comments & Replies for Any Public Video or Channel” provided by Stevesie Data. Watch this video till 9 minutes and you are good to go. All you need to know how to create your API Key. Once you have done it, you can start.

Step 2: Use the Youtube API to get all the comment data

You have to put your API key created in Step 1 in place of “Your_API_KEY” as DEVELOPER_KEY.

If you want to increase the number of results you can increase maxResults in Line 19 as per requirement.

If you see in comment section you can see you can order the comment by ‘Top Comments’ and by ‘Newest First’.

Use ‘relevance’ and ‘time’ as order in line 20 to have it sorted.

After this step, Youtube API will return data in form of JSON.

Data returned in JSON format.

If you see the data you might not understand in first go! Let me simplify it.

Go to items -> snippet ->topLevelComment -> snippet -> authorDisplayName for author who put the comment.

Go to items -> snippet ->topLevelComment -> snippet -> textOriginal for original comment.

Step 3: Put data into DataFrame

Like Selenium here we don’t have to clean the data fetched by API. We just have to access it.

Our final data after putting into the dataframe will look something like this,

This is it guys! You can use either of the way to gather the data. I like more to use API instead of scrapping and cleaning the data through Selenium. Hey, that is also a good approach. It’s just that I liked API more.

Later from next part we will use the dataframe we created with API. You can use the “author_comment” as dataframe in next part if you like to use Selenium. Otherwise, like me you can use “df_1” as dataframe if you like Youtube API more.

Stay tuned!!

--

--