Prelude

AI Fly Guy
7 min readMar 28, 2015

Big Data is the shiznit these days. Businesses be slanging Big Data collecting billions like dope slanging in the streets. Despite the hype, many still don’t know what the fuck this is. I’m here to do fools a service and drop some knowledge about Big Data with real world examples from your favorite rap artists & songs so y’all can relate. Inspired by Matt Daniel’s work on the largest vocabulary in Hip Hop, I felt there’s hella more insights to be discovered in the vast universe of lyrical masterpieces throughout the years. Crawling the web and mining data is my thang, so let me demonstrate.

If it ain’t Big, it really ain’t shit. How big is the world of Hip Hop? Peep game, we gon’ do this shit by starting with the 3Vs of Big Data:

Volume: The size of the Hip Hop universe in numbers.

Contributing to this, the song with the most words used ever is Canibus’ Poet Laureate Infinity from the album For Whom the Beat Tolls. A total of 9897 (2992 unique) words.

Artist that released the most songs with Lyrics available on Rap Genius is Lil B, with a total of 817 songs. Let me emphasize that these numbers are from artists identified on Wikipedia, and with songs listed in Rap Genius, which is a subset of the total number of rappers ever existed and songs ever released. Lil Wayne fans may argue that he have released thousands of songs, why isn’t he on top? Well y’all better start adding his lyrics to Rap Genius, or tell me where can I find all of his songs with lyrics, players.

Velocity: The rate of songs released as data generated.

Songs released by artists per year. Click on chart to see more details on Tableau.

As seen in the chart above, Hip Hop originated in 1979 with Sugar Hill Gang’s Rapper’s Delight and Grand Master Flash & The Furious Five’s Superappin'. Throughout the years there’s a steep increase in number of active artists and the number of songs released per year, with a peak in 2013 of 4966 songs from 699 artists.

The chart does omit songs where I ain’t able to find the release years, so for all y’all math nazis out there summing the count and finding it to be lesser than the earlier reported 67,389 songs, see me brushing my shoulders off.

Variety: various data types from various sources.

To get the big picture, I had to join data sets across multiple sources like DJ mixing beats. The following are sources and data in the mix.

  • Lyrics from Rap Genius, with artists information.
  • Artists & Groups information from Wikipedia including what hoods artists represent, years active, record labels, and discography.
  • Google Map to get latitude and longitude of the location of rappers.
  • Songs & Album information from Discogs, and several other music services to fill data gaps and cross validate information, just cuz dirty data can be a real bitch.

Now there may be haters out there challenging me that the volume & velocity of data here ain’t big enough to fit the definition of “Big Data”. See I ain’t even trippin’, this here is “Ghetto Big Data” son.

Frequent Words

To follow up, I’ll demonstrate what stories does the data tell us and what knowledge can we drop from ‘em. First off, if you wanna rap, you gotta know the lingo — what are common words used in rap songs? The following is a chart of the most commonly used words:

  • From left to right words are sorted by the number of songs they appear in, represented by the blue bars.
  • Green bar is the total number of words that have been ever used.
Most common words in rap songs. Click on chart to see top 500 words on Tableau.

Not surprisingly, a couple of the common words you hear frequently in rap songs are in the leaderboard. The top of the list “Like” is used in 88.08% of all rap songs, turned out this granddaddy of all filler words is what keeps the rhymes flowin’.

The following is a deep dive into a few top explicit words with artists that used them the most, the ring represents the % of songs with the word used.

As shown, n-bombs are dropped in 56% of all rap songs, where Royce Da 5'9, The Game, and Jeezy are droppin it in over 92% of their songs. Interesting finding, y’all may know that “Bitch” is Too $hort’s favorite word, but from the data it seems like Chief Keef & Curren$y have been using it more frequently in their songs. One reason could be that the “biotch” pronunciation originated from Too $hort, along with other variations in spelling like “beeotch” and “biatch”, are not captured here.

By Regions

Next, let’s look at the geographical relationship of lyrics starting with origins of rappers (based on Wikipedia):

Rapper locations. Click on map to see more details on Tableau.

The major Hip Hop scenes are:

East Cost: New York City (334 artists)

West Coast: Los Angeles (197 artists), SF Bay Area (91 artists)

Dirty South: Atlanta (94 artists), New Orleans (52 artists), Houston (48 artists)

Midwest: Detroit (41 artists), Chicago (39 artists)

Separating Hip Hop into the four major regions in the states, we can see what words are uniquely used in each of the regions in the word clouds below. The words are sorted with a method similar to tf-idf where unique frequent words of a region used my multiple artists are up-weighted, and words commonly used across the regions are down-weighted. The colors represent the following:

Green: names of crew one represents, their homies, or stating one’s own mother fucking name.

Tan: location, or what hood one represents.

Yellow: common local slangs, region specific topics, and some other shit.

East Coast

East Coast represent: Charles Hamilton, 50 Cent, Jay Z, Nas, Wiz Khalifa, Mac Miller, KRS-One, Killah Priest, Canibus

Above are East Coast artists that released the most songs (details).

Reppin’ ‘em hoods

From all dem tans, y‘all can see that East Coast rappers love representin’ their hoods. Most commonly mentioned locations includes: Flatbush, Brownsville, 7–1–8 Area code, the tri-state area, and Queensbridge (Q.B.).

East Coast rappers frequently holler at the mayors of NYC Rudy Giuliani, George Pataki, and Michael Bloomberg expressing their dissatisfaction, as well as mentioning of correction facilities in Comstock and Spofford.

Lovin’ ‘em wardrobes

Clarks & Benetton are frequently mentioned by Wu-Tang Clan.

West Coast

West Coast represent: Lil B, Tupac, Snoop, The Game, Kendrick Lamar, Ice Cube

Above are West Coast artists that released the most songs (details).

Reppin’ ‘em crew

From all dem greens, you can see that West Coast rappers love representin’ their crew. Most common shout outs include: Dogg Pound Gangsta Crips (DPG or DPGC), Tray Deee, and Tha Alkaholiks.

Love for ‘em whips

West coast rappers have a thang for cars and do crazy shit with them, such as lowridin, ghostridin, g-ride, and the love for daytona rims.

Dirty South

South Side represent: Lil Wayne, Gucci Mane, Camillionaire, Big KRIT, K-Rino, Curren$y, Soulja Boy, Three 6 Mafia

Above are South Side artists that released the most songs (details).

Most creative in local slangs

Compared to East & West, there’s relatively more dominant yellow in the word cloud, showing words frequently used uniquely in the south. Some be in their ‘bauds rollin’ with 84s swangers showin’ off. Some be dope dealin’ bales and re-rocking cocaine in their bandos and traphouses.

Other common words be just southern accents like some’ing (something), tal’n (talking), yeen (you ain’t), drankin (drinking), out’chea (out here), ery’body (everybody).

Midwest

Midwest represent: Eminem, Insane Clown Posse, Atmosphere, Chief Keef, XV, Twiztid, Tech N9ne, Lupe Fiasco

Above are Midwest artists that released the most songs (details).

Reppin’ ‘em crew

Similar to West Coast, the Midwest rappers gotta represent their crew. We got Twiztid (Madrox & Monoxide), Esham, representin’ Detroit; Cheif Keef’s Glory Boyz Entertainment (GBE) and Lil Durk’s Only the Family (OTF) representin’ Chicago; Tech N9ne representin’ Kansas City.

Namin’ ‘em fans

Detroit rappers have a thang for naming their fans. Most notably Insane Clown Posse and the juggalos/juggalettes, Eshan and the suicidalists, XV and the squarians.

This here is just a small demonstration of the stories data can tell, and what Big Data is all about. Like Biggie say, “If you don’t know, now you know.”

I’ll carry on the work by analyzing topics frequently mentioned by rappers in their lyrics, how they trend throughout time at different locations, and joining other datasets to discover insightful correlations.

More greatness to follow, believe that!

--

--