Rhymin’ nerdy: network analysis of rhyme pairs in rap lyrics

Fuaad Coovadia
7 min readOct 8, 2021

--

Network visualization of the relationship between words in rap lyrics produced using NetworkX and PyVis

There is nothing the internet loves more than unlikely companions: Labradors and ducklings, babies and chickens, or early 20th century sun-worship cults and coconuts. Another unlikely pair is that of hip-hop and quantitative analysis: from popular analyses such as this one by the Pudding which ranks rappers by size of their vocabularies, to journal articles, like the one by Finnish computer scientists entitled ‘DopeLearning: A Computational Approach to Rap Lyrics Generation’. The internet is aflush with nerds trying to show love for a genre they’re not cool enough to join, the best way they know how — by swinging around their Pythons screaming “I think rap is really cool, nice and awesome!”.

My favorite footnote from the DopeLearning paper.

I thought I had buried my own urges to walk the line between hip-hop and nerdiness a decade ago, when teenage-me hit billboard rock-bottom with this banger. But then last week I was listening to the radio and heard Megan Thee Stallion rhyme ‘wine’ with ‘fine’ (“Lookin’ fine, sippin’ wine” — Beautiful Mistakes by Maroon 5 ft. Megan Thee Stallion).

Like a sleeper-agent with a trigger-phrase lodged unknowingly between the folds of their cortex, I was awake; aware that this rhyme between ‘fine’ and ‘wine’ must have been done dozens of times before, surely. Which made me wonder; which rhyming pairs are most popular in rap? Which are the least popular? And, could we visualize the rhyme choices of artists to answer the above two questions?

To figure this out, I gathered the all-time top 500 rap songs from the Genius.com API, gathered their 47 439 rows of lyrics, and used Python to analyze the relationship between rhymes¹. I extracted only the last word in the line from these 47k rows, to get the end-of-line rhyming word. The goal was to produce a visualization of the relationship between rhyming words.

Networks: its all about who you node

Below is my first attempt at this; a gigantic, indecipherable web which plots every ‘end-of-line word’ as a point (called a node) in the network and draws a line between words (called an edge) if they appear together one line apart in lyrics. For example, in Cardi B’s Bodak Yellow lyrics, the end-of-line words ‘shoes’ and ‘choose’ would get a line drawn between them because they appear one line apart.

These expensive, these is red bottoms, these is bloody shoes
Hit the store, I can get ’em both, I don’t wanna choose

I feel you Cardi, we shouldn’t need to choose between the left or right shoe. I don’t know who needs to hear this but: please 👏🏾 sell 👏🏾 shoes 👏🏾by 👏🏾the 👏🏾pair!

Anyways, that’s all networks are folks, a collection of ‘nodes’ and the relationship between them, ‘edges’:

The word wide web

However, just because a word is at the end of a line doesn’t necessarily mean it will rhyme with the word at the end of the next line. For example, if the verse has the rhyming structure AABB, then at least one of those relationships (AB — the end word of line 2 and the end word of line 3) is a non-rhyming relationship.

This was the first major challenge that presented itself; how could we isolate only the rhyming pairs?

One answer is to use the CMU Pronouncing Dictionary which is an:

“open-source machine-readable pronunciation dictionary for North American English that contains over 134,000 words and their pronunciations.”

In practice, this meant downloading the entire dictionary as a JSON file and writing a function which determined whether any two words were a rhyming pair by checking if their phoneme set matched up².

The first block shows the output of the CMU Pronouncing dictionary — a list of phonemes for the word ‘fox’. The second block tests the working of the function which checks whether words rhyme (i.e. word1 ‘glue’ does indeed rhyme with ‘flu’ therefore it gets an output of True).

The function worked ‘phoneme-nally’ (I’m so sorry, I had to) and from a total of 47 439 end-of-line words, the function revealed that only 6 136 were actually rhyming pairs according to standard pronunciation (including half-rhymes like ‘stopped’ and ‘wept’). Now, as any good rap fan will tell you, there are plenty of ways to make words rhyme beyond ‘standard pronunciation’. I’d like to draw the class’ attention to Kanye West trying as hard as he can to make this point in ‘Hold My Liquor’ where he rhymed ‘in October’ with ‘Deepak Chopra’ (a non-rhyme according to the function I used here):

one cold night in October
Pussy had me floatin’, feel like Deepak Chopra

Another great example of unlikely companions is Kanye West and this cat he speaks so highly of.

Words of a feather plot together

Using only rhyming pairs to create the network led to a very different network structure. In the picture shown above, we have a dense network with lots of connections, but as you can imagine, when you only link words that rhyme, then you limit the scope for cross-network linkages, and instead you are left with many isolated mini-networks as can be seen below (there must be a technical term for this? Hit me up if you’re actually a Graph Theorist reading this with gritted teeth).

Network of rhyming pairs from the top 500 rap songs

In the above, clusters of many words which rhyme together form constellation-type shapes, words which only rhyme with one or two other words in the lyrics considered appear as short line segments, and the single points are words which are only observed as rhyming with themselves — like Pitbull’s infamous rhyming of “Kodak” with “Kodak” in Give Me Everything — and for this reason we shall henceforth call these nodes “Pointbulls”.

Exploration of the rhyming pair network pictured above

Getting edgy: interactive plot to see frequency of rhyme pair occurrence

Going through these constellation structures gave me an idea for the kind of common rhyme patterns which appear in the lyrics but it still didn’t give me a good indication about which ones were the most popular. How many times have wine and fine been rhymed in this dataset?

For this we need to apply ‘weights’ to the edges which connect the nodes. The weights here are simply the number of times a rhyming pair appears across all of the lyrics considered. The below plot includes these weighted edges — they’re not visible at first sight but the plot is interactive so swim around a bit. You may even find some new rhyme ideas for that next chart-topper you’ve got bouncing around your noggin’.

https://chart-studio.plotly.com/~fuaadness/1#plot

What have I learnt?

Well, that I owe Ms. Thee Stallion an apology, the rhyme between ‘fine’ and ‘wine’ did not appear once in these top 500 rap songs (only ‘fine’ and ‘whine’ appeared one time). Moreover, I owe her a further apology for being so judgy about her rhyme choices. Even if it did turn out that it was a very popular rhyme, like that of ‘bed’ and ‘head’ which *feigns surprise* is the most popular rhyming pair at 20 instances across the 500 songs, that wouldn’t relegate it to being a bad rhyme.

I started off believing that there were some rhymes which shouldn’t be in use anymore because they’re overused and tired, but this may imply that rap is nothing but an exercise in searching for new, interesting, ever more obscure rhymes. Left in the hands of myself, and my other quantitatively questioning quest-mates, rap may be reduced to a mere optimization problem where each new song tries to find the least-used rhyming pair.

So if this exercise has taught me anything, it is that trying to quantify hip-hop has made me ever more attentive to the kind of beauty which cannot be validated or invalidated with rational analysis. Seeing that Drake’s Hotline Bling barely has any rhyming pairs (‘bling’ and ‘thing’ are doing the real heavy-lifting) makes me even more appreciative of a song I really enjoy because I can’t simply explain away why I like it, with math or fancy pictures.

What lies beyond the reach of quantification is wonder, and I think each attempt like this exposes the gap between what we know about hip-hop and what we feel when we listen to hip-hop. So may the unlikely companionship of rap and rigor continue, and for those who would like to play around and pick-up where I’ve left off, I’ve linked my truly shoddy code below.

Actual footage of what making network visualizations is like

Links

Plotly interactive network visualization: https://plotly.com/~fuaadness/1/

Github code for this project: https://github.com/fuaadness/rap_rhymes

Notes

[1] I only considered end-of-line rhymes, so no within-line rhymes. This is certainly something to add if anyone were to expand this analysis.

[2] Huge thanks to Ouflak from this StackOverflow thread, whose suggested answer became the basis for the function I ended up using to test rhyming pairs.

[3] Another huge thanks to Rebecca Weng whose code for visualizing Shakespeare character appearances helped me with the trickier interactive Plotly stuff.

But I’m not a computer scientist — so there is no doubt an easier, more code-efficient way of going about this. I undertook this project because I wanted to learn about network analysis, so the code is perfect for those of you who want to see a beginner muddle their way towards an answer.

--

--

Fuaad Coovadia

Chief Shill for the Data-Humour-Industrial complex | Twitter @fuaadness