The Vocabulary of Buck 65, Quantified
Ever since I first heard his songs Wicked and Weird and Way Back When in high school, Buck 65 has been my favorite rapper bar none, and one of my favorite artists of any genre. That’s why when I first ran across Matt Daniels’ Rapper Vocablulary Chart in college, the first thing I did was go looking for him. It wasn’t surprising that he wasn’t on there — it’s not exactly like Buck was a household name — but it did leave me wondering where he would land on the chart.
In the years since, I would occasionally see updated versions of the chart pop up. I’d always check it out to see if Buck had made the list, but I was disappointed every time. It seemed like a real shame too: I had a feeling that he’d be up there in the rankings with lyrics like “I feel like a jellyfish, uncephalized, uncivilized/Unspecified, unspecialized/Currents carry me, my own endurance buries me/Deterrents make me weary, so I wear this ring for reassurance”. I thought that if anyone could unseat Aesop Rock (the largest vocabulary in rap), it just might be him.
More recently, I’ve found myself listening to a bunch of Buck 65 again, which has brought the question of where he would fall on Daniels’ chart back to the front of my mind. When I first saw the chart, I had no idea how I could go about finding out the answer to that question. However in the intervening years, I’ve learned a lot about solving just that sort of problem with code, so I got to work figuring it out.
Getting the data
My first step was getting enough lyrics to analyze. Adhering to Daniels’ original methodology would require at least 35,000 words. A cursory look at his entry on Genius seemed like scraping it would clear that threshold, and a few Google searches led me to a great article and a very helpful Python package which let me easily gather the data I needed. If you want to follow along with the code I used to do it, you can find the repository here, and an HTML version of my Jupyter notebook here.
What I ended up with needed some cleaning and trimming, but that went relatively easily. I stripped out all the newlines and special characters in the lyrics (thank you regular expressions), and extracted the album names from the messy JSON strings that were entered in the ‘album’ column.
I also had some lyrics to remove. You see, Daniels tried to only use E.P.s and studio albums, which left me with a lot of songs to filter out. Most of these fell into the category of mixtapes and unreleased material, but there were a few special cases. I had to remove two studio albums from the dataset: Weirdo Magnet and This Right Here is Buck 65. The former only had one song on Genius, which was re-recorded on a later album. The latter is a best-of album, which meant that all but one or two of the songs in there were already accounted for. I didn’t see much value in recounting songs, so I took them out.
Crunching the data
That process of cleaning and trimming the data left me with a total of 43,291 lyrics. With all the songs available, I concatenated all the songs into one giant string, tokenized it into words with the Natural Language Toolkit, then converted the resulting list to a set to filter out duplicates — leaving me with a list of unique words. In all of those lyrics, I found 7,521 unique words. However, to make the comparison with Daniels’ chart, I needed to cut down my sample to Buck’s first 35,000 lyrics. For the Buck 65 nerds out there, that translates to the albums Language Arts, Vertex, Man Overboard, Synesthesia, Square, Talkin’ Honky Blues, Secret House Against The World, and most of Situation.
Narrowing the sample gave me a result of 6,557 unique words. That put Buck in 3rd place on the list: ahead of Jedi Mind Tricks at 6,424, but still well behind Busdriver and Aesop Rock who both clock in at >7,000 unique words. In fact, Aesop Rock used more words in his first 35,000 words than Buck 65 has in his entire body of work.
With that long standing question answered, I decided to have a bit of fun with the sample. Using his last 35,000 words gets us 6,537 unique words, implying a slight decrease in vocabulary over time. Using a series of random samplings of 35,000 words, we get results that tend to average out in the high 6,600s, but reach down to the 6,590s and up to the low 6,700s.
That was when I decided to add some more albums. You see, Buck 65 released two albums under the name Bike for Three as part of his collaboration with Belgian DJ Greetings from Tuskan. Since all of the lyrics were Buck’s, I decided to pull those albums into my dataset. I was expecting them to show up as a big bump in unique lyrics since in my opinion, those albums are some of his most poetic.
The total number of unique lyrics went up to 8,197 (from an increased sample of 51,739, so no surprise there). The changes for each sample were mostly mild, but still surprising. The first 35,000 sample changed only mildly: going from 6,557 to 6,554. Both of Bike for Three albums are later albums, so it makes sense that this sample changed only slightly. The last 35,000 sample went all the way down to 6,436 from 6,537, which was surprising. Like I said, I was expecting an increase, but as a fellow fan pointed out: those albums rely heavily on repetition as a rhetorical device, which would bring down the ratio of unique lyrics to total lyrics. The random samples tended to hover around the same place: the high 6,600s.
I had a feeling that Buck 65 would be up there in the rankings, so third place isn’t surprising. Part of me thought he would be a bit closer to Aesop Rock and Busdriver (or even surpass them) but that was mostly wishful thinking. Even though Buck uses a lot of unique words, his lyrics don’t tend to be quite as fast or as dense as Aesop’s. If you want a clear example, I’ll direct you to B. Dolan’s Jailbreak, the only song I know of that has both Aesop Rock and Buck 65 on it.
It would have been nice to track down and include more of Weirdo Magnet, since it’s the only album of any genre where I’ve ever heard the word “coelacanth”, but I don’t see it making much of a difference. To overtake Busdriver’s 767 word lead, there would need to be a lot of unique words in there other than “coelacanth”, and I don’t think there are all that many.
With that question finally answered, I’m not sure what more there is to say. Third place is not bad at all for a dude from rural Nova Scotia. Now, go listen to to some Buck 65, he’s pretty damn good.