Decoding Songwriting With Data

A recap of a talk given by Eric Boam and Paul Jacobsen at SXSW 2017. (Audio)

How much of songwriting comes from the heart? How much comes from the brain? Do songwriters follow any sort of best practices? Do they have a measurable style? Are there other quantifiable aspects of a lyricist? What data could help us decode songwriting?


Dissecting Lyrics

Every songwriter has an opinion on the importance of lyrics but most arguments fit into two camps — compose words because of their meaning or arrange them based on the sound they make and the feelings they evoke. The narrative songwriter telling a story will employ the former strategy, while the lyricist trying to evoke a feeling may use the latter. Woody Guthrie, Billy Bragg, Rosanne Cash, Solange, and Frank Ocean tend to be narrative driven, while David Byrne, Matt Berninger, and Jeff Tweedy have been vocal about caring more about the sound and feeling of the words. That’s not to say some writers (or songs) don’t straddle both camps or wander back and forth.

From the angle of analysis, lyrics are data. A common denominator that can span artists, albums, and genres as well as produce myriad dimensions to analyze. Words are also classifiable by part of speech, length, syllable count, and readability, to name a few.

The body of recorded songs is impossible to count, increasing by the second. Collecting and processing the lyrics of those songs is therefore an insurmountable task, not to mention capturing all their measurable dimensions. That hasn’t stopped some from trying. The payoffs for those who have tried have been handsome. Here are a few projects for example:

Tahir Hemphill & The Hip Hop Wordcount Project

The Largest Vocabulary In Hip Hop

Wall Street Journal: Hamilton Rhyme Analysis

Modern Music Sentiment Analysis

An Impossible Project

One day, someone will be able to analyze the collective lyrical corpus of mankind. That day is not today. Any other subset of songs and lyrics would be arbitrary, but that didn’t stop us from identifying a group of albums and giving it a shot. Because we have a decade-long history of debating music, particularly using Top 10 Lists, we decided to take our favorite albums from 2016 (Eric’s, Paul’s) and combine them with the top albums on Rob Mitchum’s very helpful Album of the Year List Project. This gave us 25 albums to work with and close to 300 songs, with some diversity, relevancy, and to which most everyone could relate, making our plunge into the heart of lyric-writing a little bit broader and a little less self-important.

The lyrics came from Genius, compiled in spreadsheets, and then processed through a series of word analysis tools like readable.io, wordcounttools.com, and databasic.io. From these sources we were able to measure every conceivable metric outside of rhyming. Word count, syllable count, nouns, verbs, pronouns, adverbs, adjective, contractions, words over 7 letters, common words, unique words, monosyllabic words, polysyllabic words, and readability grades, to name a few.

From this data set we began to explore — sorting, charting, filtering, and interpreting until stories and patterns began to emerge.

Big Words

One of the simplest data points turned out to be one of the most insightful. The longest words in each song often correlated to the identity of the musician and theme of the album. This revealed to us that songwriters tend to use the same (or similar) common words as building blocks, which are short by letter count, to build their verses, so that it’s the bigger, more unique words that make the verses and songs distinct. Here is an example of the longest words in each song for a couple different albums:

Laura Gibson’s long words have a Portland-esque feeling, Billy Bragg/Joe Henry’s hearken to a time when trains ruled transportation. Solange’s long words reinforce the theme of her album that she described as “identity, empowerment, independence, grief, and healing”. Kanye has an amazing mix of words that are Kanye-esque in every way.

Let the Listener In

One way songwriters make songs more accessible and personal to listeners is through the use of pronouns. Pronouns anonymize the lyrics and allow listeners to put themselves into the story of the song. When we queried the lyrics for the first person singular pronouns, we found a clear signal in the data. “I” and “you” are the most common pronouns. In fact, for close to every album, those are the top two most used pronouns. Was this surprising? Not necessarily. But was it insightful? Absolutely. Songs can be very personal and “I” and “you” are the two most intimate pronouns. The data suggests that listeners respond to personal songs that they can inhabit.

Every song is about you and I
The first-person perspective dominates songwriting

The Percussive Nature of Words

Instruments aren’t the only aspects of a song that create rhythm and melody. The syllables of each word have their own meter and sound. Calculating the syllable density of each song can give us a sense of how dominant the words are in driving the pace and sound of the song.

A box-and-whisker plot shows the range and emphasizes the median and critical mass of the distribution, while neutralizing the outliers.

Some albums have a wide range of syllables per minute. Others are much tighter. What seems to separate the two categories is genre. While it’s debatable whether more syllables per song makes each syllable more or less important, we can compare them against the beats per minute of the song to find out if the song is more syllable dominant or beat dominant.

Subtracting the syllables per minute of each song from the beats per minute gives us a sense of which is more dominant in the song

Most revealing are the three albums whose median value is close to neutral. That means that the beats and syllables per minute are close to equal. Those albums highlighted (Beyonce, Solange, Frank Ocean) are self-described as narrative albums, telling a story from start to finish. It makes sense then that the words and the beat would be completely interconnected.

Proving Mastery

The Pitchfork review of We Got It from Here… Thank You 4 Your Service gives high praise to the late career mastery of A Tribe Called Quest, including a quote from Q-Tip where he credited the mastery of craft as being the key to great art, not the exuberance of youth. As we worked through our analysis, we found that the data also echoed the sentiment. We Got It from Here… had the most impressive statistics by all measurable metrics. Longest words, word count, unique words, readability (grade level), and polysyllabic words, to name a few.

One example of Tribe’s data dominance

We dug deeper to see how We Got It from Here… stacked up against their prior work. Using the lyrics from their 5 previous albums, we looked at the word counts and the readability scores of each song, hoping to find a pattern or progression over their career.

From the exuberance of youth to the mastery of craft

Over time, their Readability score has risen while the word count per song has dropped. They have honed their vocabulary, being able to say more with less according to the data. For me, seeing the quote from Q-Tip proved out in the data was an inspiring moment. A clear convergence of the craft of songwriting and a blind, analytical dissection.


Process == Formula

Among the best compliments anyone could pay to a musician, songwriter, designer, or artist, outside of affecting someone, is that they have established a great process. Most know that it is the process that yields the art. Possibly the worst comment made to the same group is that their work is formulaic. In reality, the two concepts are not that different. One’s process is akin to a personal algorithm, written and refined over time. The kind of consistency generated by an algorithm is seen within the data of song lyrics, with each lyricist’s unique code on full display.

Certain songwriting tropes resonate with some listeners more than others, but we all have our predilections. While a songwriter probably won’t use our analytical method to alter their writing process, it certainly had great effect on us. As lifelong, self-proclaimed lyrics first listeners, we will forever listen to music with an expanded appreciation for the gravity of each word.


Eric’s data + music work is compiled on his website: www.ericboam.com. Copies of his printed reports can be purchased here.

Paul’s music lives in all the usual places: Spotify, Bandcamp, and his website.