Data Bowie: The Math Behind the Music Legend

Michael Alwill
15 min readMay 31, 2018

--

The verdict’s in: David Bowie’s lyrics are meaningless.

Or at least that’s what we’re to believe if the New Yorker, the Atlantic, and David Bowie himself have anything to say about it. And, while we might debate the opinions of the highbrow mags, there’s no denying that Bowie invoked all manner of randomness when it came to his lyrical concoctions — just look at by his embracing of Brian Eno’s Oblique Strategies, his cut-up method for songwriting, and his mid-90s Verbasizer Mac-app approach to lyric writing.

But is that the entire story? Or is there more to it than that?

“There, in the chords and melodies, is everything I want to say. The words just jolly it along. It’s always been my way of expressing what, for me, is inexpressible by any other means.” — David Bowie

I wish Bowie were still around to ask, but in lieu of his personal guidance I decided to turn to the only thing I had: data. Which — as a writer and creative entity myself — I know is more than a bit sacrilege. The machines gain on us every day, and one of the last things humanity has left is art; we don’t want to believe the data-blooded robots can do that too. Still, I think Major Tom himself would’ve gotten a kick out of using numbers to plumb the depths of his subconscious, despite any sordid details following.

I. Table Setting

I’m going to be showing a lot of charts throughout this piece, and I have no doubt some of you are not math people. When possible, I’m going to boil down the pictures into bolded insights and takeaways, so keep an eye out. But what you need to know about the data I’m using is this:

  • All albums, lyrics, and songs are from Genius.com.
  • I considered Bowie’s original studio albums, along with the Tin Machine songs he was credited with writing; I did not include re-releases, live versions, or remixes.
  • For songs on multiple albums, I used the earliest release year, as my intention was to get at Bowie’s state of mind when he wrote the song.

For those of you who are a bit mathy, here are some extra nice-to-knows:

  • My data scraping was done 90% in R (geniusR) and 10% in web sources; all analysis and charting was done in R.
  • For sentiment analysis, I did not perform any stemming or lemmatization or word replacement, but I did remove stop words.
  • Remember, data isn’t a stand-in for critical thinking; it can provide possible answers to our questions, but we shouldn’t be afraid to question it.

Oh, and before we get started, perhaps you’d like a Bowie Spotify playlist to accompany your read?

II. The Basics of Bowie

Okay enough prelude — let’s get to the data:

  • David Bowie was an active musician from 1962 until his death in 2016. That’s 54 years of artistry.
  • He crafted 366 original studio album songs, including songs he wrote for Tin Machine from 1989–1991. Our data set will use those 366 songs.
  • He produced 25 studio albums.
  • In the US and UK alone, 50 of his songs hit the Hot 100 (Billboard, US) and Official Charts (UK).

Here’s a look at Bowie’s output over his 5+ active decades:
(remember to open charts in a new window if they appear too small)

Songs released by decade by charting status

Insight: The 1970s were Bowie’s most prolific decade, but during the 1980s he had the most charting success.

With nearly 120 songs produced during the 1970s, there’s no doubt Bowie was a force to be reckoned with at that time, though we can see he had the most songs chart in the 1980s. We also see lulls in the 2000s and 2010s, and as Bowie historians know, there was a decade stretch around then when Bowie had went without releasing any studio albums.

But defining Bowie by decade seems a bit… odd. After all, he’s the original many-faced god, sporting personas like Ziggy Stardust, Aladdin Sane, the Thin White Duke, and Major Tom. He was a man unafraid to move on to new creative periods while leaving old ones behind, and I decided to factor this in as Bowie’s Creative Eras.

(of the 1972 “Space Oddity” video by Mick Rock) “I really hadn’t much clue why we were doing this, as I had moved on in my mind from the song.” — David Bowie

I’m going to be using these Creative Eras a bit, so let’s define them (courtesy of Wikipedia):

  1. 1962–1967: Early Career to Debut Album
  2. 1968–1971: Space Oddity to Hunky Dory
  3. 1972–1973: Ziggy Stardust
  4. 1974–1976: Plastic Soul and the Thin White Duke
  5. 1977–1979: Berlin Era
  6. 1980–1988: New Romantic and Pop Era
  7. 1989–1991: Tin Machine
  8. 1992–1998: Electronic Period
  9. 1999–2012: Neoclassicist Bowie
  10. 2013–2016: The Final Years

As you can see, these eras are not equal length. But let’s see what Bowie’s releases look like according to them anyway:

Songs released by Creative Era by charting status

Whoa! Much, much different. We now have the 1970s releases spread across Bowie’s many creative phases, and see just how much of his Era #6 (1980–1988) represented his charting success. No wonder why there’s so much association between Bowie and the 80s!

III. Charting a Course

Earlier I mentioned that Bowie had 50 songs chart throughout his career and, as you saw above, many of those were in the 1980s. But exactly which songs of his charted? And how does his charting compare between the US and the UK?

Let’s have a look:

Bowie’s Top 5 (well, 6) Charting Songs in the US, now with chart typo!
Bowie’s Top 5 Charting Songs in the UK, also with chart typo!

Insight: The US and UK have different tastes in Bowie’s music.

Well, duh I suppose. Being from the UK originally, it’s not surprising that Bowie had some earlier hits like The Jean Genie and Space Oddity chart there. And does it really come as a surprise that Fame and Young Americans would catch on in the US?

Can we more specifically compare US and UK taste though? In fact, we can — by looking at songs that charted in both the UK and the US. Conveniently there were 10 songs that made both charts (phew). Here they are:

Comparison of Bowie’s US vs UK Charting Songs, sans typo!

Now this is fun. Look how US favorites Fame and Young Americans sit in the mid-teens on the UK charts and how, *gasp*, UK’s top ranker is nowhere to be found AND how one of its #2s is at an appalling #71 on the Billboard Hot 100 chart. Talk about sacrilege.

Still, the two nations have a few things in common: both feel similarly about Day In Day Out and very similarly about Tonight. Listen here if you’ve never heard that one before.

All this talk of charts and charting is dandy, but any true audiophile — and Bowie diehard — knows how much gold there is in all the tracks that never reach that blistering state of lowest common love-ominator. I think it’s time we had a look at what Bowie was saying all those years.

IV. Common (and Not So Common) Ground

“The truth is of course is that there is no journey. We are arriving and departing all at the same time.” — David Bowie

Most Common Words in Bowie’s Discography, by count of songs words appear in

Insight: Love. Time. Eyes. Life. World. These are the most common songs you’ll find in Bowie’s lyrical catalog.

Here’s another way to look at those same words (and many more):

Most Common Word Wordcloud

Underwhelmed yet?

Despite the fancy wordcloud, these hardly seem like surprising words, right? After all, so much music talks about love and time and life and blah blah blah; it’s not surprising to see Bowie talking about these universal themes as well. Let’s dive in a little deeper though with views by decade and creative era:

Most Common Words by Decade
Most Common Words by Creative Era

Okay, there’s a little variation. But just a little. We still see Love showing up all over the place, as well as Life and (for the most part) Time. I guess Tin Machine wasn’t too much into talking about Time. But this still feels on some level unsatisfying. Did Bowie, the master chaotician when it came to his lyrics, really just use the same heap of words again and again? Isn’t there some other way we can get at the subconscious themes throughout his creative career?

Why, yes. Yes there is.

For you non-mathy people, this is about to get a little scary. I’m sorry. But hang on and I’ll do my best to make it as easy to understand as possible. What we’re going to do is this:

Instead of looking at regular ol’ common words, we’re going to look at words common in songs that aren’t in all the other songs. In other words, we’re going to favor common words in each decade and creative era that are less likely to show up in all the others.

(Psst. For you non-non-mathy people, this known as TF-IDF, or term frequency — inverse document frequency, and it’s a pretty, pretty, pretty popular measure among information retrieval algorithms.)

So, if every decade has Love and Time and Eyes in its lyrics, we’re basically going to throw those out and see what’s left. These will act like representatives of what may have been going on in Bowie’s mind (consciously or subconsciously).

Shall we take a look?

Weighty Words by Decade (note: the scale is a weighting ratio, not # of songs)
Weighty Words by Creative Era (note: the scale is a weighting ratio, not # of songs)

Insight: There are significant changes from decade to decade and era to era in the weightiest words in Bowie’s lyrical catalog. Towards the end of his life, Bowie increasingly surfaced deathly topics.

Undoubtedly Bowie covered many topics in his 25 studio albums, and we can see some markers of his album topics in the charts above. Just take a look at Creative Era #4, during which he released Diamond Dogs, to see traces of the 1984-inspired dystopian album. Or the nature-focused qualities present in Creative Era #9, or the grim words showcased in Creative Era #10, shortly before his death.

The words are still eclectic — as is to be expected due to Bowie’s creative processes — but nevertheless he was choosing from the randomness presented to him and I believe we see some of that in these charts, just as we definitely see a shift in sentiment from one decade or era to the next.

“ I always had a repulsive need to be something more than human.” — David Bowie

V. Diversity Higher

We’ll dive deeper into Bowie’s word choices — along with their associated sentiments — in a minute, but before we do let’s look at the diversity in his word choices. Knowing how lexically diverse Bowie was during any given period should also give us some food-for-thought about both his own creative journey and the general journey of pop music over the ages (about which The Pudding has done an AMAZING write-up on that I recommend you read).

Here’s a distribution of the word count in his songs:

Distribution of Word Count by Song by Charted Status

Insight: Most of Bowie’s songs had between 150 and 250 words. Most charting songs fell into this range as well.

See that song allllll the way on the right that’s nearly 600 words? That’s John I’m Only Dancing (Again), a significant overhaul of Bowie’s response to John Lennon regarding a criticism about Bowie’s cross-dressing. If you’re anything like me, you probably didn’t even know that happened!

The small bar all the way on the left isn’t 0 by the way, but a binned (grouped) set of songs with very low word count, such as V-2 Schneider and Warszawa.

Word count is interesting, but what’s even more interesting are the unique words used in a song (i.e. Pitbull’s Achilles Heel). And to look at that we’re going to use a very (funk to) funky and curious set of charts:

Lexical Diversity by Decade
Lexical Diversity by Creative Era (Remember to scroll up if you forgot the eras!)

A few things about these funky bean-charts:

  • The “bean” is a smoothed distribution; the fatter the bean, the more songs have that number of distinct words per song.
  • The black bar is the central tendency; you can think of it like an average.
  • The dots jitter left and right, but this is only so you can see them better; there is no data meaning there.

Insight: Bowie’s songs have (mostly) the same lexical diversity over time.

Granted, there were ebbs and flows, but for the most part Bowie’s lexical diversity stayed the same over time, albeit with fewer peaks post-1980s.

The all-time peak of his diversity was during the Space Oddity period, which included his most lexically diverse track: Cygnet Committee. This track was also Bowie’s second-longest, clocking in at 9:36 with 153 unique words (and 557 total). Ironically, his longest track — Station to Station — is 10:15 long, with only 60 unique words (469 total).

So, now we know what Bowie was saying during his 5+ decades of making music and how diverse his word choices were, but we still need some way to consider the emotion and sentiment of his lyrics. For that, we turn to…

VI. Getting Sentimental (Analysis)

MATH WARNING WOOP WOOP WOOP

Sentiment analysis is still a young field and so I’m going to throw the disclaimer down to take everything presented here with a grain of salt. For those who don’t know, sentiment analysis is typically done like this:

  • You take your words (reports, essays, lyrics, novels, whatever).
  • You take the available lexicons already created, which typically means one of the following four: AFINN, Bing, NRC, or Loughlin (there’s also the Syuzhet lexicon, but it’s for longform text); these lexicons have been created by humans who rate sentiment by the words in them as positive, negative, and/or part of one of a set of emotional categories.
  • You apply the lexicon sentiment to your words, usually removing “stop words” (filler words) and, possibly, performing tasks to make your word analysis cleaner or more comprehensive.

The thing is, there’s not really a lexicon for music (or poetry for that matter). That means we have to make due with what we’ve got and, in the case of Bowie’s lyrics, that means using the NRC lexicon. I chose this one because it had the most matching words of all the lexicons with the words present in Bowie’s lyrics.

Still, there’s directional value in these imperfect lexicons. Let’s look at what the NRC lexicon has to say about overall sentiment in Bowie’s lyrics:

Overall Sentiment in Bowie’s Discography

Insight: As you can see, Bowie’s music rates as more positive than negative, with emotions of joy, trust, and anticipation topping the list of sentiments. If we look at the positive/negative split over time, we also get this:

Sentiment Polarity (positive = blue, red = negative) of Bowie Songs by Year

This provides a more nuanced picture and allows us to pick out some patterns, such as the dominating positivity of Bowie’s 1970s, the negative lump of the 1. Outside album (1995), and the negativity towards Bowie’s death that’s present in albums like Blackstar (2016).

If we go a step even further and layer this sentiment polarity with events in Bowie’s life, we get this:

Sentiment Polarity by major events in Bowie’s life

Of course it takes time to write music before it’s released, but if this chart is anything to go by then we see that 1975 was a very, very, very good year for David Bowie (incidentally during his coke binge years), a trend that continues until 1980, as he moved to Berlin to kick his drug habit. Thing get much darker in 1989 — when Bowie for the first time in 20+ years steps down from singular stardom — rises again with his second marriage, and then hits a low the year of his death.

Looking at this chart it’s hard not to think that Bowie was transmuting his personal struggles with cancer and the possibility about his impending death into his music; surely there’s nothing meaningless in that.

“That’s the shock: All cliches are true. The years really do speed by. Life really is as short as they tell you it is. And there really is a God — so do I buy that one? If all the other cliches are true… Hell, don’t pose me that one.” — David Bowie

Now, let’s look at the emotional categories of joy, anticipation, sadness, fear, disgust, anger, trust, and surprise.

Radial Charts by Sentiment for Select Years

If we look at a few choice years — 1975, 1983, 1995, and 2016 — we can see the sentiment expressed by Bowie’s lyrics changes substantially. Fear and anger are heavily present towards the last 20 years of his life, whereas joy is much more prevalent at the beginning of his career, with very little sadness to speak of.

Insight?: Is this the story of a man becoming more bitter and hardened, or simply more mature? Could it be Bowie seeing more of the world and trying to say something about it? It’s impossible to say, but with Bowie’s tendency to explore the world both outside and in, I would be surprised if he didn’t develop more skeptical attitude as he aged.

Here’s an even more comprehensive view of sentiment by decade:

If you’ve never seen a chord diagram before, don’t freak out. Here’s how to read it:

  • The slices at the top represent emotional categories of sentiment; those at the bottom represent decades.
  • The lines from the decades extend to the sentiments, showing how each decade is split among the emotional categories — and vice versa.
  • The scale used is total words for a given sentiment or decade.

There’s a lot to take from this chord diagram, and I think it works best as a general reference for flow between the two categories. What we can also do is to pick songs from Bowie’s Creative Eras and see how they show sentiment:

Sentiment by Select Songs from each of Bowie’s Creative Eras

Insight: I’m Afraid of Americans is not a happy song. Anyone who’s heard it likely knows that, but this sentiment chart really shows that. Interestingly we also see a lack of trust in Heroes, intense trust in Space Oddity, joy and sadness in Young Americans, an odd quotient of fear in China Girl, and so, so much anger in Lazarus.

Lastly, I want to dig even deeper into a few of these songs to pull out not just sentiment, but the words that our NRC lexicon is keying off of to produce this sentiment. What you’ll find below are “Sentiment Maps”, or links between sentiment words and their sentiment categories for five songs (note, for I’m Afraid of Americans I included positive/negative to fill out the word list).

These Sentiment Maps should give you some idea of why our lexicon thinks these songs are the way they are.

My question to you is: Do you agree?

(remember: words can have multiple sentiments!)

At the start of this piece, I threw out the assertion that David Bowie’s lyrics are meaningless. The New Yorker’s said it, the Atlantic’s said, and David Bowie’s said it. But is it true?

For sure his lyrics can be enigmatic and obtuse, with meaning put on them by fans and critics alike, and sure he’s deployed plenty of forms of randomness when writing his songs. But based on the words and themes that emerge — both commonly and in a weighty fashion — and how the sentiment of his music shifts over time, I’m inclined to believe his lyrics weren’t meaningless. There’s something to them, something deep and buried and tucked away, a kind of map of emotion and subconscious thought waiting to be uncovered.

And wouldn’t that be perfectly Bowie?

“I don’t know where I’m going from here, but I promise it won’t be boring.” — David Bowie

I hope you enjoyed our little data voyage and, if you did, please consider hitting the clap button 👏 to help others find this piece too! For more of my work, check out my website http://michaelalwill.com, or find me on LinkedIn or Instagram.

--

--

Michael Alwill

Writer, analyst, designer, and all around polymath. Brooklyn-born and universe-raised. Turning over stones and gathering no moss. www.michaelalwill.com