What makes Country Music… Country?

12 min readApr 30, 2019

Back in 2018, Lil Nas X, an artist from Atlanta, Georgia released his single “Old Town Road”. By March 2019 the song was all over the Billboard charts: the Hot 100, the Hot Country Songs and the Hot R&B/Hip-Hop charts all had it. But a week after it charted on the Hot Country, it was taken off by billboard for not being country enough[1]. While “Old Town Road” was taken off the country charts, it still remained on the R&B/Hip-Hop charts.

I had always been under the impression that the billboard charts were a representation of what’s playing on genre-specific radio stations. I never realized that billboard measures the genre-ness of songs to decide what goes on what chart.

This got me thinking, if billboard indeed has managed to quantify genre-ness , maybe we can come up with a similar measure. If we manage to create a metric to quantify the genre-ness of songs, we might be able to measure how country “Old Time Road” really is. And along the way, we might also be able to figure out what exactly makes country music… country.

The issue is that I am as musically inclined as a goldfish. However, to compensate I went and learned Natural Language Processing (NLP). I now can teach computers how to read language. They are not very good at it, but I still teach them. Unfortunately, language reading computers are also not very musically inclined, however they can be used to analyze the textual part of music, the lyrics.

So hypothetically speaking, if we can get the lyrics of a bunch of country songs, and a bunch of R&B/Hip-Hop songs, we should be able to extract the patterns in the lyrics which make country music, country and not R&B/Hip-Hop.

Now the first thing we need to do is to get a list of songs that are representative of the various genres. There are lists of songs all over the internet, but these lists do not always agree on the genres. So we need to pick a source of the songs that we believe is the authority on genre and use that as “the” genre of the songs.

Well, Thank God for Billboard!

This all started because billboard decided “Old Time Road” wasn’t country, and now we are here trying to figure out how country “Old Time Road” is. It seems fair to assume that even before we actually started on working on this problem, we had already assumed that billboard is the authority on genre. So let’s just move forward with this assumption.

A quick look at Billboard’s website shows that it has 14 charts[2].

And each category has further sub-categories with weekly charts going as far back as 1958.

We can write a tiny script to crawl the Billboard website and get all the songs(but not albums) from these sub-categories. But do we really need 14 categories? We are only concerned with demystifying country music and to do that we should look at all the songs that have charted on all the Country sub-categories. But we are also interested in what differentiates country music from the other genres. Since we mostly care about“Old Town Road”, it makes sense to compare country songs with R&B/Hip-Hop, and now we have two genres. Since we are already going into all this trouble, let’s just add a couple more genres to our mix to see what happens (Too many genres never hurt anyone). Rock and Pop seem like a reasonable choice. This would result in us getting a list of 48,733 unique songs.

But before we actually start our analysis, we need to decide what we want to do with multi-genre songs. In our set of songs, we have songs with a clear genre like Post Malone’s “Blame It On Me” which charted on the R&B/Hip-Hop list back in May 2018, but on the other hand we have a song like Post Malone & Swae Lee’s “Sunflower”, and even Old Time Road, which charted on multiple lists.

If we have just a handful of multi-genre songs, I doubt they will actually make a difference to our results, and we can just ignore them. But before we make that decision, we need to see how many songs overlap.

The table above shows that 85.21% of R&B/Hip-Hop songs do not overlap with any other genre while the remaining 0.10%, 0.60%,14.09% overlap with Country, Rock and Pop respectively. Country seems to be part of a tighter clique with just 5% of the songs also charting on non-country charts. In the case Rock songs, while around 84.32% songs exclusively charted on Rock charts, 1.00%, 0.23%, and 14.45% also charted on the R&B/Hip-Hop, Country and Pop charts. While only about 56% of the Pop songs charted on just the Pop Charts.

I think it would be good-hygiene if we remove all songs from our lists which span multiple genres. This means that out of a total of 48,733 songs we are only going to consider 43,803 songs and ignore the remaining 10.1% of the songs. Doing so will leave us with genre-pure songs.

A recap: We have a list of songs for each genre based on the assumption that they actually belong to that genre because they appeared on only the Hot 100 for that genre, and also assuming that billboard is the authority on music genres. We seem to be working with quite a lot of assumptions. I doubt our NLP techniques are going to be representative of the real world. But regardless, let’s move forward with our spherical cow models!

The Billboard charts only give us the song titles and not the actual lyrics of the songs. We can probably try to come up with some predictive model by just using the song titles, but I think such models would be a bit too brittle.

Fortunately, websites like lyrics.com and genius.com have reasonable collections of song lyrics. On top of that genius.com has a python-API that we can use to query their database. A little python-code later and we can now query the genius.com database with all the songs that were on the Billboard charts for our 4 genres.

Disclaimer: We might not be able to get all the songs lyrics from genius.com. We can ignore all the songs that don’t have lyrics on genius.com. Which should leave us with a smaller dataset, but one still big enough for our purpose.

Once we have the lyrics for our songs, now we need to come up with a method to extract the representative-words for each of our genres. The first thing we need to do is to concatenate the lyrics of all the songs of a genre into a single document, giving us four documents, each corresponding to a genre. Now we can create a simple method to find the representative-words. We can count the frequencies of occurrences each word (case-insensitive)in each of our genre-documents and extract the words with the highest frequencies.

However, if we do that, we will probably end up with stop words like “the”, ‘is’. Even though these are the most frequently occurring words, they aren’t really that informative. To extract the most informative words, we can use one of my favorite algorithms TFIDF.

How the Term Frequency Inverse Document Frequency(TFIDF, but not tiffydif) algorithm work: If a word occurs a lot in a genre, it would have a high term frequency but if it appears in a lot of genres, it would have a low Inverse Document Frequency. Thus TFIDF penalizes common words like stop words while giving preference to more frequent occurring rarer words. A high TFIDF score is indicative of more informative words.

We can calculate the TFIDF scores for all the words in our 4 genre-documents and extract the 20 words with the highest TFIDF scores.

The most informative words for each genre

Finally! We have results. According to the most informative words, R&B/Hip-Hop songs seem to about “n****s rapping about f*cking h**s”. In the case of Country music, the songs seem to be talking about “hillbillies driving their pickups and tractors in Tulsa”. We can also see that most rock musicians are suffocating and need to be liberated from the vultures of society. In contrast to all these genres, there doesn’t seem to be any particular trend in Pop Music. It just seems like random words. Who would’ve thought!?

In one of our earlier tables, we saw that Pop Music has a high overlap with other genres, and it might be because of this mishmash of genres that there isn’t any discerning pattern in pop lyrics.

We can plot the word clouds for all four genres and scale the words by their TFIDF scores, to get a better idea about the scope of the informative words. One thing to note is how prominent “n***a” is in R&B/Hip-Hop compared to the other words (and the sparsity of the cloud). There doesn’t seem to be a comparable word in any of the other genres. One interpretation of this can be that “n***a” is an integral part of the R&B/Hip-Hop identity, and the other genres do not seem to have an equivalent identifier.

Word Cloud For Rock (left) and R&B/Hip-Hop(right)

Word Cloud for Pop (left) and Country (left)

We can take our analysis further by trying to create a model to predict the genre of a song based on the words used in the song. For this, we can use a Naive Bayes classifier. Wikipedia has a nice article on the Naive Bayes classifier, but for the sake of this project, the only thing worth remembering is that the Naive Bayes classifier is extremely easy to implement.

We could’ve used any classifier, we could’ve even used our TFIDF scores to measure the cosine differences and create a k-NN classifier, but the ease of the classifier combined with its robustness makes it ideal for text classification. There is also a lot of historical precedence behind its use in text analysis based products, like spam filters.

We now go machine learning and split our data into 10 folds with 9 folds acting as our training set and 1 fold as a test set. And use cross-validation to calculate the average accuracy of our model.

Average Accuracy of a 10-fold Naive Bayes Classifier

And we can see that we have quite an awful model! Every music genre is being mis-classified as Pop. Which just means:

Everything sounds like Pop!

This could be because the set of Pop songs(3065) is much smaller compared to the other genres (approximately 5500–5700) leading to a bias in our model. Because of this smaller set, each individual Pop song would have a higher impact on the final model, compared to the songs from the other genres. Think how much more power each voter in Wyoming has (which just gets 2 electoral college votes) compared to California (which has 55 college votes), even though California is the more populated state.

Fortunately for us, the question we want to answer is whether Old Time Road is more R&B/Hip-Hop-like or is it more Country-like. So if we ignore the fact that our model has a skew towards misclassifying everything as Pop, we can still have a decent model. Our model was correctly able to classify R&B/Hip Hop songs 34.62% of the times. But if we ignore the 49.95% of the songs that it classified as Pop, out of the remaining 50.05% we managed to correctly classify 69.17% of the time. This means that if we do ignore Pop our model is actually quite reasonable.

If we consider the case of country songs, we managed to classify 39.09% songs as Pop. But from the remaining 60.91% of the songs, we got a whopping 88.95% of the songs classified correctly.

We can now try to use this classifier on “Old Time Road” to get the likelihood of it being part of a particular genre. We assign a probability of each word being in a genre and then take the product of the probabilities to get a final likelihood of the song being part of the genre. Our model is 99.98% sure that “Old Time Road” is a country song. while it is less than 0.00000098% confident that the song is actually R&B/Hip-Hop.

The likelihood of outcome given that we correctly predicted country and it was actually country, and we incorrectly predicted country but it was R&B/Hip-Hop

Let us assume that the “Old Time Road” was actually an R&B/Hip-Hop song, of which our model thinks is a 0.00000098% chance. Our model has a tendency to be wrong as well. We can see from the above diagram that there is a 1.86% chance that it misclassified a Country song as an R&B/Hip-Hop, but there is also a 34. 62% chance it correctly classified an R&B/Hip-Hop song as such.

If we want to know what is the likelihood that we have misclassified Old Town Road as an R&B/Hip-Hop but it was actually a Country Song, we can simply take the product of the confidence of our Prediction and the probability of our model misclassifying a Country Song as R&B/Hip-Hop. Giving us a likelihood of 1.86%x0.00000098% = 0.000000018%.

Similarly, we can also get the likelihood that Old Town Road was actually an R&B/Hip-Hop song and our model correctly predicted it as such. This we can get by again multiplying the confidence of our prediction with the probability of classifying an R&B/Hip-Hop song as an R&B/Hip-Hop song. Giving us a measly 34.62%x0.00000098%=0.00000034%.

Now let us assume that the song was predicted incorrectly as a country song by our model, but in actuality, it was an R&B/Hip-Hop song. The likelihood of this happening can be calculated by taking the product of the probability of us incorrectly classifying R&B/Hip-Hop songs as country, 9.49% and the confidence of our prediction, 99.98%. This gives us a likelihood of 9.49% x 99.98% = 9.47%. Similarly, we can get the likelihood that our model correctly predicted Old Town Road as a country song. This is the product of our model correctly prediction country songs at 54.18% and the confidence of our prediction at 99.98%, giving us a likelihood of 54.18% x 99.98% = 54.17%.

So, according to our model, lyrically “Old Town Road” is more likely a Country song and not an R&B/Hip-Hop song, with a likelihood of 54.17% vs 9.47%, respectively. Now, does this mean that Old Town Road is actually a Country Song, and the definite authority of genre-ness: Billboard misclassified it? I don’t know! Like I mentioned earlier, I am not a music expert so I can’t give a definitive answer. But, you can say “Hey you made a model and the model said it is a country song, so it MUST be a country song”. And, in response, I would like to reiterate that our model is definitely not the best model in the world, we made a lot of assumptions while making it (we had assumed that we live in a universe where the cows are spherical and the math is simple). It is literally called a “Naive” Bayes classifier. And we shouldn’t treat it as the music expert.

We might be tempted to ask that if the model can’t give a definitive answer, then what’s the point of this entire exercise. Well, the “world” is complicated and there are rarely any definitive answers to anything. Our model abstracted the complicated world into a grossly simpler one and then found an answer for that simpler world. We can definitely use our model’s answer to inform our own decisions, but that doesn’t mean that we should treat its answer as gospel.

Before ending this exercise, there’s one last thing we can do and that is visualizing the lyrics of Old Time Road. Each word in the lyrics contributed to our model’s decision, but some words are more indicative of a genre than others. We can color every word in the lyrics with the genre it most likely corresponds to. However, it might be visually more appealing to only color words which are more closely tied to a genre, this will reduce unnecessary noise. If this was all random, a word would have a 25% chance of belonging to a genre. We can define a word as being closely tied to a genre if it has more than a 50% probability of belonging to it. Given these constraints, “Old Time Road” looks like this:

Lil Nas X’s Old Time Road Lyrics. Country Words are in Red, and R&B/Hip-Hop Words are in Green

Fin!

What makes Country Music… Country?

Written by Osama Khalid