War !!

Johann Baydeer
Nerd For Tech
Published in
5 min readFeb 17, 2021

During this bootcamp at General Assembly, I realized that it can be very challenging to find the balance between work and personal life. I think I am very lucky because I’m driven by this passion but I can’t count the sleepless nights anymore lol.

I always can count on music to relax and I keep coming back to these Bob Marley’s songs in between assignments ! It must be something in those songs.

Then I figured it would be a good Idea to see what people felt when they listened to these songs and use a little bit of Natural Language Processing, just to keep the muscle going !

I will scrap the 100 most relevant comments of a Bob Marley video using the Youtube API and turn those into a data frame for further analysis. I chose the War/No More trouble live video of the reggae superstar :

Luckily for me, the processing was very straight forward thanks to the extensive documentation on the Youtube API. I was able to get the comments using this code :

The parameters here are the part, which specify the actual info I wanted to obtain, the maxResults equal to 100, the videoId where I had to input the unique video ID and the order based on relevance. I also have to specify that I had to get a key, which is pretty much common for most of the API I used up to this point. This where I can shamelessly redirect you to my previous post for more details on API and Web scrapping !

Here you can see one of the 100 values I got back in JSON format. We are only interested in the comments here so we’ll need a little bit of Indexing !

At this point I can append the 100 comments to an empty list and turn it into Data Frame:

Now I do have a Data Frame with the 100 most relevant comments of this particular video. What do you think would be my next step ? Cleaning and more cleaning! It’s 80% of the job….

The first thing I did was to remove all the numbers from the comments. The reason is that I will later “count vectorize” these comments, or in other words split the comments by words to analyze their frequency. In the second line I created a word_count column which indicates the number of words per comments. Wait! I didn’t even show you the data frame yet !

Of course he’s a legend ! I should stop here if I wanted to see if people felt like me but let’s go a little bit further ! Let’s look at some statistics for the word_count column :

The average comment has 19 words and the longest has 227 words. That’s long. Let’s observe the word_count distribution:

I mean you don’t have to say a lot; I totally agree.

The next step was to initialize a count vectorizer, fit it with the comment column of the data frame and create another data frame with the transformation results, assigning the words to the column names:

Each columns are like dummy variables and indicate if whether or not a word is present in the comment. Now that I have done that, I can observe the 20 most common words:

Bob and Marley are pretty obvious, One thing I could have done is including more stop words to remove these values but for the sake of time and the fact that I ran out of Red bull let’s ignore the name. Now we have music, love, like, man, song or strong words like peace, legend, evil and good. I got a sense of what it means but let’s observe some comments with the word war per example:

Wow, somebody went as far as writing the lyrics, I love it! I can quote one comment: “Until the color of a man’s skin is of no more significance!!! WAR”. I think it was relevant to observe this because it’s mostly a negative word but here it is employed for social justice. What about evil ?

Ah… The lyrics again! Of course it contains every words. It also provide a context here: “ Good over evil”.

Overall the comments are mostly positive :

“Medicinal music to heal the nation”. This is what I feel when I listen to this music so my null hypothesis is not rejected ! There is something in those songs. I think I will need to add every comments and write a second part for this article with some visualization and sentiment analysis.

For now, “Stop that train: I’m leaving”

--

--

Johann Baydeer
Nerd For Tech
0 Followers
Writer for

immersive data science bootcamp @ General Assembly