Mac Miller: A Lyrical Analysis and Admiration š
Using Python machine learning, the Genius API, and my robust knowledge all things Easy Mac
New career, new city, new relationships, new priorities.
As anyone who began a new chapter in their life would expect, a lot has been going in and out of the door since I graduated college. As I take in all the excitement, I always try to take a moment to remember who I am and what I love. Although I identify all my music to be a pillar of my growth and personality, there is one specific artist that comes to mind. Not only did he shape my taste in music, but he also influenced my values and friendships. Of course, Iām referring to Mac Miller!
āOh geez- another fanboy article that idolizes a celebrity- did he really have that much of an impact?ā Itās true! My sister introduced me to Mac Miller in 7th grade. She showed me a video of this Jewish rapper from Pittsburgh that made a Fireflies remix. At that moment, I was hooked. My iPod Shuffle was filled with all his preceding mixtapes- from The Jukebox to K.I.D.S. That was the same year I got my first debit card, where I saved enough money to get a wildly overpriced Thumbs Up T-Shirt. When his first official project, On and On and Beyond, was released, I checked iTunes every hour to see how it was doing on the charts. I learned about his journey of producing his own music and the respect that he garnered from the industry. From my eyes, Macās career defined the zeitgeist of a 21st-century Pittsburgh: young, artistic, and entrepreneurial. He could do no wrong.
September 7th, 2019 marked the one year anniversary of Mac Millerās passing. Last year, I was in a state of shock when I heard the news. To those around me, a rapper overdosed on drugs. To me, an astonishingly talented kid from Point Breeze became yet another victim of the opioid epidemic. From signing a petition to get Party on 5th Ave to play during Pens games, learning about the impacts of opioids on communities, to donating to The Mac Miller Fund, I did all I could do to carry on the legacy of a warranted legend. So, after a year, I want to sustain that tradition by reflecting on Macās career through the lens of my hobby: being a weird data scientist.
Deciding what to analyze (and how)
Since I wanted this article to be somewhat of an ode to Macās career, I developed the plan to collect Macās lyrics over the years, group them by their projects, and perform A/B comparisons. Mac is one of the most consistent rappers in the industry- releasing at least one album or mixtape per year since 2009. Although such a short-lived career, collecting enough data is no problem.
So, I hit up the Genius API to get all Mac lyrics. No Larry Fisherman. No Delusional Thomas. Real Mac fans know that those are completely different artists that each deserve their own analysis.
Getting the Data
The first step is to get all the lyrics from Mac Millerās EPs, mixtapes, and albums and store them in a pandas.DataFrame
object (just think of a DataFrame as a data table). Since the Genius API does not enable access to lyrics (for legal purposes), you have to get a little creative.
First, I used the search API to get Mac Millerās artist id in Genius. Then, I used the artist/:id/songs request to get the URL of every single Mac Miller song. At that point, I used BeautifulSoup
to scrape the HTML of every URL. After adding the song name, album, and lyrics into a DataFrame, I did some basic filtering to only store the songs in Macās twelve, studio projects. Then, the DataFrame is stored in a .csv file for easy retrieval. I created a simple wrapper in my Python project to perform these tasks. I generalized it so any artistās songs can be obtained (my repo provides the results of analyzing both Mac Miller and Childish Gambino).
Analyzing Mac
Cool, so I have all of Macās songs and their albums stored in a .csv- now what? This is the exciting part and why I got into data science in the first place- as long as you have an interesting, quality dataset, your creativity is limitless. For inspiration, I referred to Hip Hop By The Numbers, a blog that does rap analysis like this all day. Theyāre definitely worth the follow!
After doing this research and playing around with the data, I created three visualizations in Python with the help of pandas, matplotlib, nltk, and sklearn.
Visualization #1: Lexical Richness
This first analysis should be considered the ā101ā of lyrical analysis. It requires no fancy math or machine learning. Lexical Richness is best defined as the ratio of unique words compared to the total words in an album. Comparing the Lexical Richness of Macās albums can give us a quick insight on which points in Macās career did his lyrics stand out. In this analysis, albums that introduce more uncommon words will be rewarded compared to Macās straightforward lyrics.
The function that lays out the logic of how I calculated Lexical Richness from the DataFrame can be found below:
After writing code to plot the results of graph_df
in matplotlib on a bar chart, I got the following:
Looks like the middle of Macās career (during 2010ā2013) represents the peak of Macās creative wordplay. In I Love Life, Thank You, Mac was having a lot of fun finding his flow, as he began hopping on more risky, experimental beats. But with this new style, he still kept a lot of his quirky, homeboy rhymes:
They always want that fun sh ā when you tryna spit a little somethinā real
They start runninā, turninā chicken like a nugget meal
How the fā I feel? Kinda like a hundred mill
Spun the wheel, bought a vowel, I donāt owe you nothinā still
- Mac Miller, Cold Feet
Watching Movies With the Sound Off and Faces brought us to the height of Macās introspective career. Mac never released anything similar to the content that we heard in these two albums, as his fame, influences, and collaborators were expanding. For the first time, Macās stereotypical audience of fratty suburbanites saw directly into a life of an artist that was filled with depression, addiction, and detachment from reality. All this was depicted in such a poetic cadence:
Close my eyes before I cross the street
If a car about to hit me, then he ought to beep
Watching Dawsonās Creek ātil I fall asleep
Itās harder than it seems, Iām under water in my dreams
Iām in awe, this jigsaw, puzzles not complete
Iām just an idea, nothing concrete
- Mac Miller, S.D.S.Okay, my mind is Yoda, Iām on Ayatollah
These other rappers just a diet soda
I find Jehovah in the darkest places, empty as apartment basements
This a marathon gentlemen, go āhead and start the races
Save the coffin spaces
- Mac Miller, Diablo
Visualization #2: Sentiment Analysis
Sentiment Analysis can give the viewer a high-level understanding of the emotional tone found in text. It gives us the ability to quantify seemingly subjective content. We are able to do this with the help of the nltk
, or the Natural Language Toolkit Library, in Python. The nltk library has a SentimentIntensityAnalyzer
class that uses logic to assign a sentence a point value between zero (very negative) and one (very positive).
The nltk
library is far from 100% accurate. My buddy Jonathan Jiang made sure that I note to you all that Sentiment Analysis is still in its early stages, and the algorithms are trained on limited datasets. Although not perfect, this tool can give us a simple illustration of how Macās emotional state has changed throughout his albums. This code shows my Sentiment Analysis on Macās albums:
All the sentiment scores are categorized into either negative, neutral, or positive-sounding songs. All those numbers are added up and divided by the total sentences to get a percentage of the categories. The graph below shows a visual of each albumās sentiment analysis:
To me, this insight is no surprise. Macās most positive albums are The Divine Feminine, The Jukebox, and Best Day Ever- in that order.
Do I really have to explain why The Divine Feminine is Macās happiest album? Itās literally a love album for Ariana Grande! Every song revolves around the topics of love, relationships (of all types), and his optimistic role in the universe.
Said, you just donāt know how beautiful you are
And baby thatās my favorite part
You walk around so clueless to it all
Like nobody gonna break your heart
- Mac Miller, My Favorite Part
Ironically, Macās most positive album was preceded by his most negative album, GO:OD AM. It only takes a quick Google search to learn that Mac was going through a serious drug addiction, and that was certainly reflected in his music.
Everybody saying I need rehab
āCause Iām speedinā with a blindfold on and wonāt be long until they watching me crash
And they donāt wanna see that
They donāt want me to OD and have to talk to my mother
Tell her they could have done more to help me and sheād just be
Crying saying that sheād do anything to have me back
- Mac Miller, God Speed
Visualization #3: Topic Modeling
For my final visualization, I built a topic model of Mac Millerās discography. In a nutshell, Topic Modeling is searching through every lyric to find sets of re-occurring words. These sets can be viewed by a data scientist to create topics that best summarize them. Because weāre grouping lyrics by albums, we can see how frequently Mac spoke on these topics over the course of his career.
Topic Modeling in Python can be done with the help of sklearn
- a machine learning library. First, we perform a āterm frequency-inverse document frequencyā (TFIDF) vectorizer, which is a fancy way of saying that we count the frequency of each word while penalizing the most common words (because words like ātheā and āaā wonāt help us in choosing a topic). Then, we take those counts and run a non-negative matrix factorization (NMF), which groups co-occurring words together. By combining the frequency of words with these groupings, our code can rank the most common topics in Macās lyrics. The following code will output our sets of words:
Here are the six sets of twenty words that came from this function:
Topic #1: know time mind go never way home baby think feel alright little girl gonna really leave find tell come inside Topic #2: love name baby take nothing gon one make something want heart forever know things round somebody away think day girlTopic #3: bitch fuck shit money god drugs bitches fucking ass give still tryna million dick much little want made die damnTopic #4: mac miller kid know rap bitch lot things back girl name say ass class flow first better fresh bad playTopic #5: life shit night take around day world high boy party money feeling gettin wake good make better say best bigTopic #6: keep another one go weed back roll man new ride feel shoes people til smoke blow tell never real hope
Now for the creative (and subjective) part! My fellow LineByLine writer, Clayton Marshall, and I decided on the following topic labels:
Topic #1: Humble Life
Topic #2: Romance
Topic #3: Braggadocious
Topic #4: Music/Art
Topic #5: Partying
Topic #6: Smoking Weed
Not bad, right? Please comment if you have a recommendation on better names š. To learn more about the Topic Modeling process, check out this Codecademy blog.
With the aid of these labels and the NMF matrix, we can plot a line chart of Macās most featured topics over the years:
If youāre like me, your interpretations of this chart are split. A lot of it is logical! Earlier in his career, Mac rapped a lot about his humble life back home and adamant love for Pittsburgh.
New kicks give me cushion like whoopie
Keep a smile like an Eatān Park cookie
- Mac Miller, Knock Knock
Mac has also pleased his audience with a consistent output of party bangers. From his first hit āDonald Trumpā to the horns-heavy āGoosebumpzā, Mac is no hipster, but he can make your hips stir!
However, I can see some really inaccurate data points here. For example, thereās no way that Mac did not mention āSmoking Weedā in K.I.D.S! If you ever listened to the first ten seconds of āKool Aid & Frozen Pizzaā or āPaper Routeā, then you know that this topic model cannot be right. Also, the point that āRomanceā was not mentioned once in The Divine Feminine goes against the entire theme of the project!
Although my final visualization conflicts with the expected outcome, perhaps this is the perfect way to end the article. I had a lot of fun using Sentiment Analysis and machine learning on Macās lyrics to turn them into intriguing insights. But, maybe quantifying art is not the best way to appreciate it. Macās music is so special because it can be relatable and personal to so many fans. Regardless of what the data says, I will continue to appreciate his music through the lens of my own experiences, vices, and virtues. From now until forever, I will always play āKnock Knockā as I exit the Fort Pitt Tunnel, receiving shivers through my entire body. Thumbs up š.
For my full Github repository on this project, click here.
Have any comments or recommendations for the LineByLine Team? Please reach out to us at linebyline.team@gmail.com.