tayloR

I used R + audio features from Spotify + lyrics from Genius + words to analyze Taylor Swift’s music over the years.

12 min readJul 22, 2018

If you know me, you probably know I love Taylor Swift. More than 10 years of listening to Swift and following her in the media has given me insight into just how deliberate she is. From coded messages in her albums’ liner notes to sharply specific lyrics, Swift fills her body of work with Easter eggs, playing into and enhancing fans’ propensity for observing things. For example, the song “The Lucky One” was No. 13 – famously Swift’s lucky number – on the Red track list; that’s something many fans would notice and appreciate, me included.

Anyway, overanalysis and Taylor Swift are two of my favorite things. The programming language R is another, so when a friend showed me Charlie Thompson’s spotifyr package, I knew Swift’s music would be the first thing I analyzed.

Spotify defines certain audio features for each track it streams. A full list of these, along with their verbal definitions, can be found on Spotify’s page for developers. The spotifyr package reduces the process of pulling data for these audio features from Spotify’s web API to just a few lines of code. I also adapted Thompson’s code from his analysis of Radiohead’s music to scrape Swift’s lyrics from Genius, and I used those in lyric and sentiment analysis.

All my R code, as well as my final dataset taylor_with_lyrics (which I assign to spotify_genius), can be found here; I’ve only included the specific code that produces each visualization in this post. Speaking of the visualizations, this DataCamp blog post was a great (and fun-to-read!) resource.

So that’s the rundown. I guess there’s only one thing left to ask now.

… R you Ready foR it?

The music

My favorite audio feature, and the one that unarguably best complements sentiment analysis, is valence, defined by Spotify as:

“A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).”

I used a ridgeline plot to produce an overview of valence:

spotify_genius %>% ggplot(aes(x = valence, y = ordered_albums, fill = ..x..)) + 
  geom_density_ridges_gradient(scale = 0.9) + 
  scale_fill_gradient(low = "white", high = "maroon3") + 
  theme_fivethirtyeight() + 
  theme(panel.background = element_rect(fill = "white")) +
  theme(plot.background = element_rect(fill = "white")) +
  xlim(0,1) +
  theme(legend.position = "none")

The plot makes it pretty evident that reputation’s mean valence is the lowest by far. I think that confirms it; the old Taylor really is dead. reputation-era Swift’s music is more mellow and tempered, even when she’s undeniably happy, such as on “King Of My Heart” (valence: 0.298) and “Call It What You Want” (valence: 0.237).
Here are Swift’s albums arranged by mean valence. Note that reputation’s mean valence is about 0.1 less than the next lowest.

spotify_genius %>% 
  group_by(album_name) %>% 
  summarise(mean(valence)) %>% 
  arrange(desc(`mean(valence)`)) %>% 
  kable() %>% 
  kable_styling(full_width = F, position = "left") %>% 
  row_spec(row = 1:6, background = "#fffce4", color = "red")

And here are her top five tracks by valence. Each of her albums besides reputation is represented here. reputation’s first entry, “Look What You Made Me Do,” only comes in at No. 29.

spotify_genius %>% 
  select(track_name, album_name, valence) %>% 
  top_n(5) %>% 
  arrange(-valence) %>% 
  kable() %>% 
  kable_styling(full_width = F, position = "left") %>% 
  row_spec(row = 1:5, background = "azure", color = "deeppink")

Another thing: The ridgeline plot distributions for Speak Now and Red are bimodal i.e. each have two maxima. This feels fitting since the two are Swift’s transitional albums, with which she makes inroads into pop, Red more evidently so. (Never forget that beat drop on “I Knew You Were Trouble.”) It’s also interesting to note that neither won Album of the Year at the GRAMMYs, while the two that flank them – Fearless and 1989 – both did. It’s not unreasonable to theorize that Speak Now and Red’s scattered sounds had something to do with this.

Swift too has acknowledged the scatteredness of Red. She said in a 2014 interview:

“Red actually taught me that I should probably make a much more sonically cohesive album the next time around.”

Around the same time, she also said 1989 was “the most sonically cohesive album (she’d) ever made.”

To me, “sonic cohesiveness” sounds like a measurable quantity; so I thought it would be fun to put Swift’s statement to the test and create my own “sonic score,” using the audio features Spotify provides. In picking the combination of features I would use, my criteria were the following:

Must be numerical, continuous and go from 0 to 1 (some, like tempo and loudness, would be hard to standardize/weigh equally with those that do go from 0 to 1)
Must be a descriptor, not a confidence measure. I judged this as best I could from the definitions provided.

That left me with:
1) valence, defined already.
2) danceability:
“Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.”
3) energy:
“Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.”

I used a minimalist pirate plot to map the sonic score of each track. Each point represents one track.

pirateplot(valence + danceability + energy ~ album_release_year, spotify_genius,
           pal = c(wes_palettes$GrandBudapest2, wes_palettes$Moonrise3[1:2]), 
           xlab = "album", ylab = "sonic score",
           theme = 0, point.o = 0.7, avg.line.o = 1, jitter.val = .05, 
           bty = "n", cex.axis = 0.6, xaxt = "n") 
axis(1, cex.axis = 0.6, lwd = 0)
legend("topright", c("1: Taylor Swift", "2: Fearless", "3: Speak Now", "4: Red", "5: 1989", "6: reputation"), bty = "n", cex = 0.6)

I was pretty surprised, because, by my interpretation/approximation of sonic cohesiveness, the difference between 1989 and Swift’s other albums is quite pronounced. The sonic scores of the other five albums’ tracks are quite evenly spread over an interval of around 1–1.2, and each of them has at most one sonic outlier. 1989, on the other hand, has three outliers, with the other 10 of its tracks clustered into an interval of around 0.4. Let’s take a closer look at 1989’s sonic scores:

spotify_genius %>% 
  mutate(sonic_score = valence + danceability + energy) %>% 
  select(album_name, track_name, sonic_score) %>% 
  arrange(desc(sonic_score)) %>% 
  filter(album_name == "1989") %>% 
  kable() %>% 
  kable_styling(full_width = F, position = "left") %>% 
  row_spec(row = 1:13, background = "seashell", color = "#b39db2")

Not including the three outliers, the album’s sonic score range is a mere 0.366. So the math agrees; 1989 seems to be Swift’s most sonically cohesive work. It also has the highest mean sonic score. That means it beats out Red (which has a higher mean valence) in at least one category out of danceability and energy.

spotify_genius %>% 
  group_by(album_name) %>% 
  summarise(mean(danceability)) %>% 
  arrange(desc(`mean(danceability)`)) %>% 
  kable() %>% 
  kable_styling(full_width = F, position = "left") %>% 
  row_spec(row = 1, background = "seashell", color = "#b39db2")

Sure enough, 1989 is Swift’s most danceable album, with reputation coming in second; which makes sense, since the two are her completely pop efforts. Note the gap between the top three (all of which contain pop tracks) and the bottom (all country).

The lyrics

I tokenized the lyrics I scraped from Genius, and then removed stop words (overly common words such as “I,” “me,” “they” etc.) using the tidytext package’s stop_words data frame. Here’s a word cloud representing Swift’s most used words:

wordcloud(words = word_count$word, freq = word_count$n,
          max.words=100, random.order=FALSE, 
          colors= c(wes_palettes$Moonrise3[c(1:2,5)], wes_palettes$Royal2[5]))

Time and love seem to be her major preoccupations. There’s obviously a lot of romantic vocabulary: “heart,” “night,” “beautiful” and “hold,” for example. She also seems to use a fair amount of both positive and negative words.

I found the prominence of the word “remember” interesting; it feels not particularly common/generic, as well as a word I can’t pinpoint as having been repeated multiple times in a single song (unlike “shake” and “stay,” for example). In fact, it appears on 15 tracks, bringing out the major role memories play in the narratives Swift weaves.

Here are the word clouds for Swift’s first and most recent albums, Taylor Swift and reputation, respectively. How much do you think her language has changed?

wordcloud(words = word_count_ts$word, freq = word_count_ts$n,
          max.words=25, random.order=FALSE, 
          colors= c(wes_palettes$GrandBudapest2[3:1]))wordcloud(words = word_count_rep$word, freq = word_count_rep$n,
          max.words=25, random.order=FALSE, 
          colors= c(wes_palettes$GrandBudapest2[3:1]))

On to something else; lexical diversity, or the ratio of unique words to total words. This is possibly the most direct method of figuring out how repetitive a text is. I left in the stop words for this.

Here’s a pirate plot of lexical diversity. (This time, I used a color scheme based on the album covers.)

pirateplot(lex_div ~ album_release_year, lexical_diversity,
           pal = c("cyan3", "darkgoldenrod1", "maroon4", "red3", "#b39db2", "black"),
           xlab = "album", ylab = "lexical diversity",
           theme = 0, point.o = 0.5, avg.line.o = 1, jitter.val = .05, 
           bty = "n", cex.axis = 0.6, xaxt = "n") 
axis(1, cex.axis = 0.6, lwd = 0)
legend("topright", c("1: Taylor Swift", "2: Fearless", "3: Speak Now", "4: Red", "5: 1989", "6: reputation"), bty = "n", cex = 0.6)

Swift’s pure pop albums are far less lexically diverse. This makes sense; pop songs are traditionally repetitive both musically and lyrically, which is what makes them stick.

Fearless and Red once again display very similar, spread-out distributions, while Speak Now and Taylor Swift have smaller spreads. And 1989 might be Swift’s most sonically cohesive album, but in terms of lexical diversity, its tracks fall into two distinct clusters: One that falls in line with her older music, and another that reaches new levels of repetitiveness for her. In fact, four out of five of Swift’s least lexically diverse tracks are from 1989.

tidy_taylor %>% group_by(track_name, album_name) %>% 
  mutate(lex_div = length(unique(word))/length(word)) %>% 
  select(track_name, lex_div, album_name) %>% 
  arrange(lex_div) %>% 
  distinct() %>% 
  head(5) %>% 
  kable() %>% 
  kable_styling(full_width = F, position = "left") %>% 
  row_spec(row = 1:5, background = "azure", color = "palevioletred")

What would an analysis of lyrics be without some sentiment analysis? I’ve used each of tidytext’s three sentiment lexicons – AFINN, bing and nrc – to do so.

The AFINN lexicon assigns scores from -5 to 5 to the words, based on how positive or negative it is. So, for example, “masterpiece” (All Too Well) has a score of 4, while “torture” (Blank Space) is given a -4. Not all words have a sentiment or score assigned to them, since many words are inherently neutral.

I used the AFINN lexicon to create a bar graph for each album’s scaled score. The value each album’s bar represents is the sum of the scores of all the words in the album, scaled to account for differences in the number of words per album.

taylor_AFINN %>%
  group_by(ordered_albums) %>% 
  summarise(sum(score)) %>% 
  mutate(scaled = `sum(score)` * 229 / dim$n) %>% 
  ggplot(aes(x = ordered_albums, y = scaled, fill = ordered_albums)) +
  geom_bar(stat = "identity") +
  ylim(-200,200) +
  coord_flip() +
  theme_fivethirtyeight() +
  theme(panel.background = element_rect(fill = "white")) +
  theme(plot.background = element_rect(fill = "white")) +
  scale_fill_manual(values = c("palevioletred", "violetred3", "greenyellow", "lightpink", "olivedrab3", "mediumseagreen")) +
  theme(legend.position="none")

1989 is Swift’s lyrically most negative album by far. I used only unique instances of each word, thereby discounting any repetitions. This works out to a better measure of her vocabulary than the alternative, because (as we’ve already explored) her repetition can be off the charts: for example, “Shake It Off” really threw the graph, as “shake,” “break,” “hate” and “fake” are all negative words and they’re each repeated a number of times in each chorus.

Even despite each of these words only being counted once (as well as “bad” and “blood” from “Bad Blood,” despite the 15 times the phrase is used in the song), 1989 still is the most negative by far. My money was actually on reputation, since its musical positivity was so much lower that the other albums’. Let’s take a closer look at 1989’s song’s profiles, in a pyramid plot made using the bing lexicon. Each numerical measure represents a word in this case, since the bing lexicon uses the two categories “positive” and “negative” instead of scores; for instance, the graph shows that the track “This Love” has five positive words and nine negative words.

sent_taylor_1989 %>% 
  ggplot(aes(x = track_name, y = n, fill = sentiment)) + 
  geom_bar(subset = .(sentiment == "positive"), stat = "identity") + 
  geom_bar(subset = .(sentiment == "negative"), stat = "identity") + 
  scale_y_continuous(breaks = seq(-20, 20, 5)) +
  coord_flip() +
  theme_fivethirtyeight() +
  ylim(-20,10) +
  theme(panel.background = element_rect(fill = "white")) +
  theme(plot.background = element_rect(fill = "white")) +
  scale_fill_manual(values = c("palevioletred", "olivedrab3")) +
  theme(legend.position="none")

Only two of the album’s 13 songs contain more positive words than negative. The album’s two most successful tracks, and two of her most successful overall, are both very negative lyrically, yet Swift is not unhappy in either; in fact, on “Shake It Off,” she’s positively jubilant. On “Blank Space,” she is endlessly satirical; the unreality of the song’s narrative – unlike most of Swift’s songs, which are based in truth – results in a certain detachment that allows her to use multiple sharply positive and negative words in the same song.

Anyway, I’m still surprised 1989 is Swift’s lyrically most negative album; it’s so upbeat! And her 1989 persona was by far her most positive till date, from being happily single to having a large and prominent “squad” to even temporarily repairing her relationship with Kanye West. 1989 was also her most successful album in terms of sales and charting, according to this Billboard list: five of its tracks are in her top 10 tracks overall. It appears Swift is at her most appealing when her songs aren’t internally consistent i.e. their music and lyrics don’t agree on a mood.

The AFINN and bing lexicons are quite limited, however, as they only provide binary sentiment. The nrc lexicon on the other hand, allows for the sorting of words into eight additional emotional categories: joy, anticipation, trust, surprise, sadness, anger, disgust and fear. For the last visualization of this post, I used a radar chart to see how Swift’s lyrics by album stack up against each other in terms of these eight emotions.

chartJSRadar(radar_chart, 
polyAlpha = 0.1, 
lineAlpha = 0.8, 
maxScale = 25, 
colMatrix = matrix(c(0, 255, 255, 255, 185, 15, 139, 0, 139, 255, 0, 0, 201, 167, 198, 0, 0, 0), 
byrow = F, 
nrow = 3))

I put all the positive emotions on one side and the negative on the other to make trends more evident. The scale represents the percentage of the words in the corpus joined to nrc that fall under a particular emotional category. For example, over 20% of the words in the albums Taylor Swift and Fearless fall under the “joy” category. Words can fall under more than one category, but since this is accounted for in the dataset, the summation of one album’s percentages equals 100.

These sentiment profiles clearly show that Swift’s lyrics have gotten more negative over time. Her first four albums display similar profiles, with a gradual shift toward the negative over time; Speak Now scores significantly less on joy, while Red scores slightly higher on three of four negative sentiments, than each of their predecessors. 1989 and reputation’s sentiment profiles vary greatly from these first four; they’re much lower on positivity and higher on all negative emotions. Between the two, there isn’t much to choose from; 1989 ranks slightly higher on all measures of negative sentiment, while reputation is significantly higher on trust. I’d like to think that’s because Swift feels she can trust again, as she is now in a long-term relationship.

It’s interesting to see how markedly more negative Swift’s later, pop album lyrics have been. This possibly signals a correlation between sentiment and genre, but it would be gross extrapolation to claim from this that country music contains more positive lyrics than pop on average. It is evident, however, that Swift’s country music in particular is far more lyrically positive than her pop – but that seems more a consequence of her growing up, and being less starry-eyed and more disillusioned about love, fame, feuds and everything else.

That’s all I’ve got! Let me know if you have other suggestions or interpretations – I’m always down to talk about either data stuff or Taylor Swift :)

tayloR

I used R + audio features from Spotify + lyrics from Genius + words to analyze Taylor Swift’s music over the years.

The music

The lyrics

Written by Simran Vatsa