A s a fan of hip-hop music in this current political climate; I started noticing artists like Kendrick Lamar and A$AP Rocky referencing political figures in their lyrics. The question started to grow on me, how long has this been going on? Is there some way to analyze the spread of presidential references in recent hip-hop music? Recently, I took a data science class at Brown, and I thought, what a perfect time to figure this out? Let’s get a little bit more formal about defining the problem:
Analyze hip-hop lyrics for the mention of Trump and Obama and compare the annual average sentiment.
Of course, I knew that the outcome might not be perfect, but as a hip-hop head, I was excited to see the outcome.
Here is my working hypothesis I started with: Between 2000 and 2018, I believe that lyrics containing Trump will start higher in sentiment and go lower over the chosen time frame, Obama will begin neutral and go higher in sentiment within the same time frame.
The data I used was collected through BeautifulSoup by crawling the lyrics website Genius. The data is originally collected in .csv format and then converted into .txt. The data has a few main columns: The specific date and year that the song was released, the artist, the name of the track, the amount of views it received on Genius, the name of the .txt file created with all of the lyrics, the person of interest (Trump or Obama), and the average sentiment score of that track.
Originally, I tried to use the Spotify API, but it turned out that it was limiting with regards to how many tracks it allowed me to download. It didn’t give me much freedom at all so I decided to use BeautifulSoup and the Genius search function to get all of the tracks. I used regex to clean it up a bit but I also went through the data by hand because I didn’t trust the natural language toolkit, or NLTK, that we were using. As I expected, there were a lot of things that got grouped into the data that shouldn’t have been there, like full novels. I also want to mention that the reason that my .csv file contains the number of views that the track got on Genius is that I used that number to weed a lot of tracks out. Any track under 10000 views was not considered due to there being a lot of very random small tracks posted to Genius. Below is a snapshot of the full dataset.
The data includes all sorts of tracks. Everything from Yung Joc’s dirty south classic, “It’s Goin’ Down,” too much more recent additions by Kendrick Lamar, like “XXX.”
There are plenty of biases with the data. Biases not really because of a person influencing it, but because Genius has a lot of random pages with a large number of views that either has nothing to do with Trump but include the capital word somewhere, or have to do with Trump but is not a track. Also, the sentiment analyzer is not perfect. There are professional python coders that struggle with successful natural language analysis and I am solely relying on NLTK to give me results. Furthermore, NLTK has a pool of bad words that bring the score down. It shouldn’t be news that hip-hop music generally has a lot of negative words or expressions in the lyrics whether or not they are meant to be negative. NLTK does not care and so it will see a swear word that may be used in a positive way and yet bring the sentiment score down. When I started on this project, I knew that it would not return me the most perfect dataset, rather I did it because it is a topic that is interesting to me and I truly felt that it would be a great way to incorporate and encapsulate everything I have learned in this class.
The project can be broken down into three main steps. I split up the code into three separate documents for this reason as well:
- The first step is to grab all of the URLs from Genius in HTML. This has to be done manually. I basically went to the Genius website, searched the person of interest, then kept pressing load more until I got every single search result. Then I used the inspect function to get all of the links. This is put into one file. Now onto the first step. In my first block of code titled html2url, I simply change the HTML file into a nice list of URLS written in a .txt file.
- Using this new file, I took it to the next block of code called processURL. This does exactly what its title says. It goes through each URL using BeautifulSoup and collects the views, date, name of artist and song, and the location of the full lyrics. The lyrics get written in full in a folder titled lyrics in case I want to check the full lyrics to any song. It is during this step that all tracks under 10000 views are filtered out.
- The final step uses the code called songsentiment. Here, the sentiment of the song is calculated by using the file that was written as the outcome of the last step with all of the full lyrics. What this means is that songsentiment goes through this file one song at a time and for each song, it also goes through that song’s full lyrics. When this code is called, the user also has to specify what the word of interest is. This way, going through the lyrics, the code knows to look out for either Trump or Obama. Then it analyzes those lines and returns the average sentiment for that song. The user also has to specify if they want the data by year or by song. If by song is chosen, the code will write a .csv file that included everything about the song. The date, the name, the artist, the lyric location, the views, the person of interest, and the sentiment for that specific song. If by year is chosen, the code will write a .csv file that includes just the person of interest and the average sentiment for each year.
The graph below is the comparison of Trump and Obama’s sentiment in hip-hop music over the course of 18 years. While Obama isn’t mentioned until 2006, I decided to keep 6 years of just Trump to show how he was portrayed in hip-hop before he had anything to do with politics. Trump appears to have started off neutral before jumping drastically between positive and negative. Starting in 2008, the appears slightly positive before going on a downward trend past his election date. Obama starts off neutral leaning on negative before a huge positive jump in 2010. During his second presidential term, he appears to bounce between positive and negative, before ending on a higher positive than ever before.
I also wanted to highlight some interesting points in the data. With regards to Obama’s data, his most positive song, among many, appeared in 2018 in one of Chief Keef’s tracks. Sosa’s line, “feel like Obama up in the V.I.P, may not seem like a very positive line, but keep in mind, there were many high scoring songs, and Sosa’s happened to be a few decimal points more positive. Obama’s most negative song appeared in 2015 on Tech N9ne’s Special Effects. The line addresses Nina’s haters, saying, “hate me like Obama.” The song was the introduction to Nina’s album, and he touches on many political, religious, and personal topics. The song came out during Obama’s second term, a time that drew a lot of negativity from many in the hip-hop community.
With regards to Trump’s data, his most positive feature appears in 2000, far before Trump had any significant part in politics. The line appears in Nelly’s first hit single “Country Grammar (Hot Sh*t),” and has a fairly significant placement in the track. Around this time, and even years before, Trump was gaining traction in hip-hop music for the money he had. Many tracks actually started using his name as an adjective, when referencing money. This reached its climax with Mac Miller’s 2011 track titled “Donald Trump,” a song Mac noted he regretted making later in his career. Then Trump started getting involved in politics, which was not taken lightly by many in the hip-hop community. There are a lot of negative songs about Trump, but the one that had the worst rating was a 2017 track by the Gorillaz titled “Let Me Out.” The track has an interesting, albeit depressing, story about its conceptualization and recording. The Gorillaz originally came up with the idea and recorded their parts while Trump was still running for office. The song imagines a theoretical Trump presidency. The Gorillaz approached Pusha T, featured on the track, to record a verse, telling him, the album was a party for the end of the world, like if Trump won the election. When Pusha wrote his verse, Trump had won already, which created this eerie, crystal ball like feeling, when the song was completed.
There were also some very odd, yet ironic items that found their way to the dataset during the first round of data collection and before I sifted through the data. They also go to show the limitations of the program. One was Thomas Moore’s Utopia, and the other was George Orwell’s 1984. Both works are in essence about theoretical worlds, one which dreams of the perfect society, and the other, imagine what happens when that perfect society goes a bit too far. Fitting for the world we live in right now.
My battle with the data collected through crawling was definitely a challenge. Figuring out what to call with BeautifulSoup was difficult as well. Overall I am happy with the product, as it is what I hoped to have coded, even if the results are not perfect, in the sense that the data collection was a bit flawed. The next time around, I would use a different sentiment analyzer. I am unaware of others at this moment, but I am sure there are some out there that might be more complicated but more accurate. I also chose not to include Hillary to make the data clearer and simplify things overall. In the future, maybe I would consider adding more people. The code is set up such that I could technically look for any word in the lyrics of the songs that I have collected. Maybe I could expand from just presidents and analyze other aspects of the data.
I want to thank my teacher, Chris Tanner, and all of the wonderful TA’s for such a great class, as well as, my parents, Apurva and Purvi Shah for answering any and all questions I had.