Mac Miller: A Lyrical Analysis and Admiration šŸ‘

Using Python machine learning, the Genius API, and my robust knowledge all things Easy Mac

Ben Wallace
LineByLine
10 min readSep 7, 2019

--

New career, new city, new relationships, new priorities.

As anyone who began a new chapter in their life would expect, a lot has been going in and out of the door since I graduated college. As I take in all the excitement, I always try to take a moment to remember who I am and what I love. Although I identify all my music to be a pillar of my growth and personality, there is one specific artist that comes to mind. Not only did he shape my taste in music, but he also influenced my values and friendships. Of course, Iā€™m referring to Mac Miller!

ā€œOh geez- another fanboy article that idolizes a celebrity- did he really have that much of an impact?ā€ Itā€™s true! My sister introduced me to Mac Miller in 7th grade. She showed me a video of this Jewish rapper from Pittsburgh that made a Fireflies remix. At that moment, I was hooked. My iPod Shuffle was filled with all his preceding mixtapes- from The Jukebox to K.I.D.S. That was the same year I got my first debit card, where I saved enough money to get a wildly overpriced Thumbs Up T-Shirt. When his first official project, On and On and Beyond, was released, I checked iTunes every hour to see how it was doing on the charts. I learned about his journey of producing his own music and the respect that he garnered from the industry. From my eyes, Macā€™s career defined the zeitgeist of a 21st-century Pittsburgh: young, artistic, and entrepreneurial. He could do no wrong.

September 7th, 2019 marked the one year anniversary of Mac Millerā€™s passing. Last year, I was in a state of shock when I heard the news. To those around me, a rapper overdosed on drugs. To me, an astonishingly talented kid from Point Breeze became yet another victim of the opioid epidemic. From signing a petition to get Party on 5th Ave to play during Pens games, learning about the impacts of opioids on communities, to donating to The Mac Miller Fund, I did all I could do to carry on the legacy of a warranted legend. So, after a year, I want to sustain that tradition by reflecting on Macā€™s career through the lens of my hobby: being a weird data scientist.

Deciding what to analyze (and how)

Since I wanted this article to be somewhat of an ode to Macā€™s career, I developed the plan to collect Macā€™s lyrics over the years, group them by their projects, and perform A/B comparisons. Mac is one of the most consistent rappers in the industry- releasing at least one album or mixtape per year since 2009. Although such a short-lived career, collecting enough data is no problem.

So, I hit up the Genius API to get all Mac lyrics. No Larry Fisherman. No Delusional Thomas. Real Mac fans know that those are completely different artists that each deserve their own analysis.

Getting the Data

The first step is to get all the lyrics from Mac Millerā€™s EPs, mixtapes, and albums and store them in a pandas.DataFrame object (just think of a DataFrame as a data table). Since the Genius API does not enable access to lyrics (for legal purposes), you have to get a little creative.

First, I used the search API to get Mac Millerā€™s artist id in Genius. Then, I used the artist/:id/songs request to get the URL of every single Mac Miller song. At that point, I used BeautifulSoup to scrape the HTML of every URL. After adding the song name, album, and lyrics into a DataFrame, I did some basic filtering to only store the songs in Macā€™s twelve, studio projects. Then, the DataFrame is stored in a .csv file for easy retrieval. I created a simple wrapper in my Python project to perform these tasks. I generalized it so any artistā€™s songs can be obtained (my repo provides the results of analyzing both Mac Miller and Childish Gambino).

Analyzing Mac

Cool, so I have all of Macā€™s songs and their albums stored in a .csv- now what? This is the exciting part and why I got into data science in the first place- as long as you have an interesting, quality dataset, your creativity is limitless. For inspiration, I referred to Hip Hop By The Numbers, a blog that does rap analysis like this all day. Theyā€™re definitely worth the follow!

After doing this research and playing around with the data, I created three visualizations in Python with the help of pandas, matplotlib, nltk, and sklearn.

Visualization #1: Lexical Richness

This first analysis should be considered the ā€œ101ā€ of lyrical analysis. It requires no fancy math or machine learning. Lexical Richness is best defined as the ratio of unique words compared to the total words in an album. Comparing the Lexical Richness of Macā€™s albums can give us a quick insight on which points in Macā€™s career did his lyrics stand out. In this analysis, albums that introduce more uncommon words will be rewarded compared to Macā€™s straightforward lyrics.

The function that lays out the logic of how I calculated Lexical Richness from the DataFrame can be found below:

After writing code to plot the results of graph_df in matplotlib on a bar chart, I got the following:

Visualization #1: Lexical Richness of each Mac Miller Album

Looks like the middle of Macā€™s career (during 2010ā€“2013) represents the peak of Macā€™s creative wordplay. In I Love Life, Thank You, Mac was having a lot of fun finding his flow, as he began hopping on more risky, experimental beats. But with this new style, he still kept a lot of his quirky, homeboy rhymes:

They always want that fun sh ā€” when you tryna spit a little somethinā€™ real
They start runninā€™, turninā€™ chicken like a nugget meal
How the fā€” I feel? Kinda like a hundred mill
Spun the wheel, bought a vowel, I donā€™t owe you nothinā€™ still
- Mac Miller, Cold Feet

Watching Movies With the Sound Off and Faces brought us to the height of Macā€™s introspective career. Mac never released anything similar to the content that we heard in these two albums, as his fame, influences, and collaborators were expanding. For the first time, Macā€™s stereotypical audience of fratty suburbanites saw directly into a life of an artist that was filled with depression, addiction, and detachment from reality. All this was depicted in such a poetic cadence:

Close my eyes before I cross the street
If a car about to hit me, then he ought to beep
Watching Dawsonā€™s Creek ā€™til I fall asleep
Itā€™s harder than it seems, Iā€™m under water in my dreams
Iā€™m in awe, this jigsaw, puzzles not complete
Iā€™m just an idea, nothing concrete
- Mac Miller, S.D.S.

Okay, my mind is Yoda, Iā€™m on Ayatollah
These other rappers just a diet soda
I find Jehovah in the darkest places, empty as apartment basements
This a marathon gentlemen, go ā€˜head and start the races
Save the coffin spaces
- Mac Miller, Diablo

Visualization #2: Sentiment Analysis

Sentiment Analysis can give the viewer a high-level understanding of the emotional tone found in text. It gives us the ability to quantify seemingly subjective content. We are able to do this with the help of the nltk, or the Natural Language Toolkit Library, in Python. The nltk library has a SentimentIntensityAnalyzer class that uses logic to assign a sentence a point value between zero (very negative) and one (very positive).

The nltk library is far from 100% accurate. My buddy Jonathan Jiang made sure that I note to you all that Sentiment Analysis is still in its early stages, and the algorithms are trained on limited datasets. Although not perfect, this tool can give us a simple illustration of how Macā€™s emotional state has changed throughout his albums. This code shows my Sentiment Analysis on Macā€™s albums:

All the sentiment scores are categorized into either negative, neutral, or positive-sounding songs. All those numbers are added up and divided by the total sentences to get a percentage of the categories. The graph below shows a visual of each albumā€™s sentiment analysis:

Visualization #2: Sentiment Analysis of each Mac Miller Album

To me, this insight is no surprise. Macā€™s most positive albums are The Divine Feminine, The Jukebox, and Best Day Ever- in that order.

Do I really have to explain why The Divine Feminine is Macā€™s happiest album? Itā€™s literally a love album for Ariana Grande! Every song revolves around the topics of love, relationships (of all types), and his optimistic role in the universe.

Said, you just donā€™t know how beautiful you are
And baby thatā€™s my favorite part
You walk around so clueless to it all
Like nobody gonna break your heart
- Mac Miller, My Favorite Part

Ironically, Macā€™s most positive album was preceded by his most negative album, GO:OD AM. It only takes a quick Google search to learn that Mac was going through a serious drug addiction, and that was certainly reflected in his music.

Everybody saying I need rehab
ā€™Cause Iā€™m speedinā€™ with a blindfold on and wonā€™t be long until they watching me crash
And they donā€™t wanna see that
They donā€™t want me to OD and have to talk to my mother
Tell her they could have done more to help me and sheā€™d just be
Crying saying that sheā€™d do anything to have me back
- Mac Miller, God Speed

Visualization #3: Topic Modeling

For my final visualization, I built a topic model of Mac Millerā€™s discography. In a nutshell, Topic Modeling is searching through every lyric to find sets of re-occurring words. These sets can be viewed by a data scientist to create topics that best summarize them. Because weā€™re grouping lyrics by albums, we can see how frequently Mac spoke on these topics over the course of his career.

Topic Modeling in Python can be done with the help of sklearn- a machine learning library. First, we perform a ā€œterm frequency-inverse document frequencyā€ (TFIDF) vectorizer, which is a fancy way of saying that we count the frequency of each word while penalizing the most common words (because words like ā€œtheā€ and ā€œaā€ wonā€™t help us in choosing a topic). Then, we take those counts and run a non-negative matrix factorization (NMF), which groups co-occurring words together. By combining the frequency of words with these groupings, our code can rank the most common topics in Macā€™s lyrics. The following code will output our sets of words:

Here are the six sets of twenty words that came from this function:

Topic #1: know time mind go never way home baby think feel alright little girl gonna really leave find tell come inside Topic #2: love name baby take nothing gon one make something want heart forever know things round somebody away think day girlTopic #3: bitch fuck shit money god drugs bitches fucking ass give still tryna million dick much little want made die damnTopic #4: mac miller kid know rap bitch lot things back girl name say ass class flow first better fresh bad playTopic #5: life shit night take around day world high boy party money feeling gettin wake good make better say best bigTopic #6: keep another one go weed back roll man new ride feel shoes people til smoke blow tell never real hope

Now for the creative (and subjective) part! My fellow LineByLine writer, Clayton Marshall, and I decided on the following topic labels:
Topic #1: Humble Life
Topic #2: Romance
Topic #3: Braggadocious
Topic #4: Music/Art
Topic #5: Partying
Topic #6: Smoking Weed

Not bad, right? Please comment if you have a recommendation on better names šŸ˜ƒ. To learn more about the Topic Modeling process, check out this Codecademy blog.

With the aid of these labels and the NMF matrix, we can plot a line chart of Macā€™s most featured topics over the years:

Visualization #3: Topic Modeling of Mac Millerā€™s Discography

If youā€™re like me, your interpretations of this chart are split. A lot of it is logical! Earlier in his career, Mac rapped a lot about his humble life back home and adamant love for Pittsburgh.

New kicks give me cushion like whoopie
Keep a smile like an Eatā€™n Park cookie
- Mac Miller, Knock Knock

Mac has also pleased his audience with a consistent output of party bangers. From his first hit ā€œDonald Trumpā€ to the horns-heavy ā€œGoosebumpzā€, Mac is no hipster, but he can make your hips stir!

However, I can see some really inaccurate data points here. For example, thereā€™s no way that Mac did not mention ā€œSmoking Weedā€ in K.I.D.S! If you ever listened to the first ten seconds of ā€œKool Aid & Frozen Pizzaā€ or ā€œPaper Routeā€, then you know that this topic model cannot be right. Also, the point that ā€œRomanceā€ was not mentioned once in The Divine Feminine goes against the entire theme of the project!

Although my final visualization conflicts with the expected outcome, perhaps this is the perfect way to end the article. I had a lot of fun using Sentiment Analysis and machine learning on Macā€™s lyrics to turn them into intriguing insights. But, maybe quantifying art is not the best way to appreciate it. Macā€™s music is so special because it can be relatable and personal to so many fans. Regardless of what the data says, I will continue to appreciate his music through the lens of my own experiences, vices, and virtues. From now until forever, I will always play ā€œKnock Knockā€ as I exit the Fort Pitt Tunnel, receiving shivers through my entire body. Thumbs up šŸ‘.

For my full Github repository on this project, click here.

Have any comments or recommendations for the LineByLine Team? Please reach out to us at linebyline.team@gmail.com.

--

--

Ben Wallace
LineByLine
Editor for

Your data-driven, hip-hop-loving, environmentalist! IU ā€™19, Program Manager @ Salesforce, proud Pittsburgh native