Predicting the Grammys with data

Musixmatch
Feb 11, 2016 · 7 min read

Originally published at lab.musixmatch.com

Since 1959, the National Academy of Recording Arts and Sciences has awarded a Grammy for Song of the Year, choosing from 5 or more nominees each year.

58 songs have won this award (there was a tie in 1978) and 237 other songs have been nominees. A total of 295 songs.

At Musixmatch, we studied the trends in the lyrics and music of these songs. As a fun exercise, we used these trends to build a computational model which predicts the winner of the Song of the Year for 2016.

Song of the Year lyrics trends

We calculated 110 metrics calculated from lyrics. 100 come from an unsupervised machine learning model(Doc2Vec by Google) and although they are better correlated but it is difficult to explain their semantic significance in words. Following are some interesting trends from the other metrics (easy to explain) .

Language

Musixmatch automatically analyzes the language of the lyrics of every song in its database. Volare by Domenico Modugno, performed in Italian, is the only non English song to ever win this award. La Bamba by Los Lobos, performed in Spanish, is the only other non English song to be nominated. The other songs are mainly in English or instrumental.

Word counts

The wordiest song (total words and unique words) to win the Song of the Year is The Battle of New Orleans by Johnny Horton. The wordiest nominee is I’d Do Anything For Love (But I Won’t Do That) by Meatloaf but if we count unique words Lose Yourself by Eminem comes out on the top.

TOTAL WORD COUNT

The total word count rises by almost 4 words every year.

UNIQUE WORD COUNT

The unique word count rises by almost 1 word every year.

Profanity

Rolling In the Deep by Adele is the only song with a profanity (shit) to win this award. 17 other nominated songs contain profanity.

SONGS CONTAINING PROFANITY

Instrumentals

Musixmatch automatically analyzes whether a song is instrumental or not. Only 6 instrumental songs have been nominated for Song of the year and of them only one, Theme of Exodus by Ernest Gold (1961), has won the award.

INSTRUMENTAL SONGS

Deep learning model of lyrics

Doc2Vec, a deep learning model by Google, automatically infers the semantics in pieces of text and places each text in an n-dimensional space, such that texts which are similar are located closer.

We trained this model (Doc2Vec by Google) using 150k lyrics and n = 100. This model outputs 100 lyrics metrics (apart from the ones above), for the nominees and winners, which we used for training the award prediction model.

It is difficult to explain which characteristic of the text corresponds to each of the metric output by this Doc2Vec model but a more detailed explanation can be found here.

Song of the Year music trends

Echonest, a music intelligence platform, provides many interesting musical metrics calculated from the audio signal of a song. We scanned the Grammy Song of the Year nominees and winners in the Echonest database and these are some of the interesting musical trends.

Key

In music theory, the key of a piece is the tonic note and chord that provides a subjective sense of arrival and rest. Other notes and chords in the piece create varying degrees of tension, resolved when the tonic note or chord returns — Wikipedia

Generally, popular music has a well defined key and the songs studied here as well.

WINNERS

A major is the key most used by the winners followed by D major, C major and F major.

NOMINEES

C major is used most by the nominees followed by G major and F major.

Loudness

Echonest averages loudness across the whole song and returns a negative number with 0 being the loudest level possible.

The loudest song to win this award is Rehab by Amy Winehouse while the quietest one is Don’t Worry Be Happy by Bobby McFerrin. The quietest song overall is Feel Like Makin’ Love by Roberta Flack while the loudest one is also Rehab.

LOUDNESS

Across the nominees and winners, a steady increase of 0.17 unit/year of loudness is observed.

This increase is termed by experts as loudness wars. For the curious readers, there is a very detailed website dedicated to this.

Duration in seconds

We Are the World by USA for Africa is the longest song (427 seconds) to win the award. I’d Do Anything For Love (But I Won’t Do That) by Meatloaf is the longest song (721 seconds) to be nominated, it is also the song with the highest word count.

DURATION IN SECONDS

We see a steady rise in the duration till the mid 90s and then a decline. This could perhaps be due to the increase in Hiphop/Dance songs (shorter than other genres) getting nominated.

DURATION AND TOTAL WORD COUNT TREND

Overlaying the duration trend with the total word count trend we can see a decrease in the duration but an increase in the word count after the 90s. This could be because of more Hiphop songs and songs containing Rap verses getting nominated.

Danceability

Describes how suitable a track is for dancing using a number of musical elements (the more suitable for dancing, the closer to 1.0 the value). The combination of musical elements that best characterize danceability include tempo, rhythm stability, beat strength, and overall regularity — Echonest

It is an interesting metric even though not decisive in choosing the winner (according to our model as discussed later).

DANCEABILITY

The danceability of Song of the Year nominees and winners has been rising (.0007 unit/year).

Winners and nominees are getting danceable over the years

Predicting a winner

A MACHINE LEARNING CLASSIFIER — R2D3.US

We trained a model similar to the above illustration, at each node of the tree a decision is made depending on the value of a lyric or audio metric. For those wanting to dig deeper, we recommend this visual introduction to machine learning.

These are the steps involved in our prediction process.

  • Analyse lyrics and audio metrics for all the nominees and winners.
  • Use the above metrics for the winners and nominees till 2015, train a machine learning classifier (random forest classifier).
  • Input the metrics for the 2016 nominees into this classifier to get the probable winners.

Predictions — Song of the year 2016

Lyrics more important than music in predicting a winner

Apart from making a prediction, the above model can also tell us the relative importance of the metrics (feature importances).

For example — It can say whether total word count is more important in predicting a winner compared to the unique word count or the key of the song.

Summing the importances of the 17 most important lyrics and audio metrics, lyrics are twice as decisive as music in predicting the winner for Song of the Year.

Importance in deciding a winner

Conclusion

Combining all the nominees and winners in this category gives us a total of 295 songs (58 winners and 232 nominees — excluding the nominees of 2016). This is not enough data to build an accurate model and also there are many factors (social impact, popularity, etc) which haven’t been modeled here. Thus, these predictions should be taken with a very big pinch of salt.

We have observed that the lyrics are getting longer while the songs themselves are getting shorter. At the same time the loudness is increasing. We also observe computationally that lyrics are more important than music in choosing a winner.

We will continue exploring these trends in more detail in the future. Please follow us here or subscribe to our mailing list to be notified of new articles.


Originally published at lab.musixmatch.com.

Varun Jewalikar and Federica Fragapane

Musixmatch

Written by

Musixmatch is the world’s largest lyrics platform — where you can search, enjoy, and share lyrics from any track, anywhere in the world.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade