Lexical Song Recommendation

Adorno’s Culture Industry + NLP + Indie Music

Web-App hosted @ songreco.shaham.me

The Sociology and Motivation

“Resistance is regarded as the mark of bad citizenship, as inability to have fun, as highbrow insincerity, for what normal person can set himself against such normal music?” is what Theodor Adorno, the critical theorist, had to say about any rebellion against the mainstream popular trash today that we are socially coerced to call ‘art’. Anyone who dares to look down on the top selling artists is condemned as elitist, snob, jealous, or — a term that encompasses all negativity — a hater. But if we could put this liberal pride and prejudice aside, I think there is a compelling case for us to collectively agree that a critical re-interpretation of the culture we propagate is necessary. In fact, in the absence of a religious authority in the public sphere and no worthy competitor — realistically speaking — against status-quo capitalism, it could be argued that culture and art become the caretakers of our human freedom and morality. For Adorno, however, this sentiment is not as innocent and will-affirming as it sounds. Rather, the scenario is quite sinister — it is precisely the mass belief of art as an almost spiritual-like liberation from the monotony of our lives that allows for a radical mass deception.

Adorno frames a manipulative, systematic ‘Culture Industry’ which, as the name implies, monopolizes the cultural domain in an industrial fashion. To cut to the chase, this ultimately leads to the standardization of art and culture — following the typical flow of a product that flourishes in a capitalist society.

Under monopoly all mass culture is identical, and the lines of its artificial framework begin to show through. The people at the top are no longer so interested in concealing monopoly: as its violence becomes more open, so its power grows. Movies and radio need no longer pretend to be art. The truth that they are just business is made into an ideology in order to justify the rubbish they deliberately produce. They call themselves industries; and when their directors’ incomes are published, any doubt about the social utility of the finished products is removed.

All the ‘popular’ music — Adorno maintains a dichotomy between popular and serious music — that you listen to is, in fact, one song endlessly immortalized through a series of mutations and artificial perversions. The Culture Industry formulates and designs the product with predefined patterns and chords and the artist uses his/her creativity to merely dress and decorate the package. On the other hand, the consumers/listeners remain under the illusion that they are in control and free to pick and choose their preferences (..which are actually not really their preferences but formulated desires). It’s all the same chords, the same patterns and all the ‘top hits’, awards, and ranking charts are part of a theatrical drama to keep the audience further entertained. This single product is kept alive through social conditioning and ‘plugging’ so you assume the repetition of the music in public, mainstream media must indicate some inherent element that makes it successful. (“Well, yeah sure it’s garbage but I mean, it must be popular for a reason, right?”). For Adorno, this ideology, this belief that you almost involuntarily carry with you as you observe culture is no accident, but a systematic removal of your resistance to monotony. This is the ‘mass deception’: though our passionate, divergent Self turns to culture to fight monotony, culture acts Judas and sells us back to the industry (..for free!).

(Why does the system even care for something as abstract as your resistance to monotony? Adorno illuminates, “The less the mass discriminates, the greater the possibility of selling cultural commodities indiscriminately.” That is to say, why create multiple original products when you can convince the consumer to keep buying the one.)

Note: Re-reading this section I can see that it almost sounds ‘conspiracy’-like but I genuinely promise this is not the case and is only a result of summarizing. Adorno presents detailed examples and logic that outlines this general message and scrutinizes the Culture Industry on specifics.

The above preamble could more or less be called the theoretical motivation for this NLP project I am about to present: A Lexical Song Recommender System. The practical motivation was simple: if I am put in the privileged position to provide recommendations to someone who is not familiar with my kind of non-mainstream, indie music, how should I proceed?

Majority of the current recommender systems used in industry (and not just for music) are, manifesting Adorno’s concerns, a class of what-the-other-guy-who-was-like-you-liked (it’s a hybrid of collaborative and content based recommender systems). Simply put (really simply!), systems look at consumer behavior— if you listened to A, B, C and someone else who listened to A,B,C also listened to D, you will be recommended D. If you understood the heart of Adorno’s claims against the Culture Industry, you will understand why this is a symptom of a disease. What this indicates is that, reflecting the endless variants of a single product (the one song mutilated ten times over), we are also pushed towards a convergence into an endless supply of one consumer. Theoretically, this is the ideal of the Industry: Endless supply of one product + Endless supply of one consumer = Recurring supply of Capital. But, let’s be honest, can we really call this ‘manipulative’ if it.. works?

Yes, it works! It works great, people love it, it’s easy, it’s relaxing — the same product mutilated and presented over and over again has a certain familiarity to it that brings about a sense of ownership (“Hey, this is my song!”). And I think this brings about another important insight of Adorno’s work: we are past the days of direct and forceful coercion — the best coercion, it seems, is the one that doesn't seem like coercion at all. Consumers are still waiting for a “You must now all listen to so-and-so or your family will be killed” ideological kind of directive to accept a lack of freedom. But this expectation in itself is part of the ideology that governs us. The dirty secret is that the Culture Industry doesn’t need to make you do what you don’t want when it can just make you want what it wants you to do.

The analysis Tocqueville offered a century ago has in the meantime proved wholly accurate. Under the private culture monopoly it is a fact that “tyranny leaves the body free and directs its attack at the soul. The ruler no longer says: You must think as I do or die. He says: You are free not to think as I do; your life, your property, everything shall remain yours, but from this day on you are a stranger among us.” Not to conform means to be rendered powerless, economically and therefore spiritually — to be “self-employed.” When the outsider is excluded from the concern, he can only too easily be accused of incompetence.

Therefore, if the recommender system refers you to only variants of the same fundamental song — how can we break from the cycle? And how do I manage recommending something you will not neglect right away (and possibly ‘like’) while also maintaining an authenticity in the process to not propagate the agenda of the Culture Industry? The safe route: Screw authenticity, recommend hits and only hits and nothing else, ever. Eccentric route: Screw the industry, recommend that one obscure, weird song your artist released in an EP when they were still living in their mom’s basement. Here is a possible comprise: recommend the songs from your own playlist (here we will be snobs and assume our playlists are ‘serious’ music) with the most similar qualities to the user’s selection— in this case specifically, a recommender system based on lexical qualities of songs. Once again, this approach to recommendation is certainly not as profitable (neither does it attempt this) as industry standards — we are doing the socially suicidal act of deliberately choosing not to cater to the user’s taste (the customer, here, is not always right) but only use it as a guideline.

The Tech

Let’s shift away from the sociological discussion and get down and dirty with the technology. I tried a couple of models and techniques but settled on something quite basic: Euclidean Distance calculations. The idea is to generate feature vectors of lexical data from the lyrics of the user’s song and songs from the select playlist. Then, I calculate the euclidean distance between the 2 sets (the users vs. selected playlist) and present the closest distances to the user.

Analytical Features (Python, NLTK, Scikit-learn, Web-scraping)

  1. Creating a Corpus: This would be the set of lyrics from my own playlist that were automatically found by web-scraping them off of a popular lyric website. (This is also how I get the user’s song lyrics).
  2. Feature Extraction: To keep is simple, there were 3 sets of features I focused on — lexical/word stats, sentiment, and Parts-of-Speech ratios. Lexical stats include data like # of unique words, lexical diversity, # of unusual words, average length of words, etc. What these stats provide us is an overview of— in crude terms — the ‘intellect’ level of the song. Sentiment analysis extracted the amount of positive, neutral, and negative words or bi-grams in the lyrics. Lastly, Parts-of-Speech ratios (ex. how many nouns, verbs, adverbs, pronouns, etc.) helped to define perspectives and qualities of the song — this could include if the song was personal, if it involved a lot of actions, if it contained a lot of references, its general ‘vibe’, etc. All features were scaled using Standard Scaling (how many standard deviations they differed from the mean, which was calculated off my corpus). All this was done through simple math and the NLTK python library.
  3. Distance Calculations: As i mentioned before, a simple euclidean distance measure between vectors.

Web-related Features (Flask, Spotify API, Chart JS)

  1. Create a Spotify Playlist: It took me some time playing around with the Spotify API so that users could create automatically create a Spotify playlist from my recommendations — assuming they have playlist account privileges.
  2. Graphing: I used some JavaScript libraries to display semi-interactive charts that help users visualize the data.


  1. To provide a transition point between a user’s current taste and another set of (serious) music (e.g. my playlist)
  2. To provide recommendations that are at a similar lexical and verbosity level of the user’s selection.
  3. To visualize the lexical quality of songs and how they relate to other songs.

Try It Yourself

I have a (beta) webpage up currently that will let you try it out experimentally: Shaham’s Lexical Song Recommender System

It should be pretty self-explanatory (I hope!) but the instructions are simple: Enter the name of a song/artist that you’re currently listening to (or as an alternative route, enter a poem in the custom text section!) and see your results. Once the recommendations are shown along with their euclidean distances, you will be presented with the option to automatically create a Spotify playlist if you have an account that supports playlist creation. You will also be displayed some charts and graphs that can help you visualize the lexical features mentioned earlier.

Some things to note:

  • As a trial (to make processing fast), I selected ~20 random artists mainly from pop/indie/rock and singer, songwriter kind of music to generate my playlist with up to ~1000 songs. Partially, because that’s what I’m really into at the moment and partially because I feel they are verbose and original enough to spread out the results. This selection also seems to work because each artist has a ‘indie hit’ but also stuff no one’s heard of.
  • Unfortunately, since some of the more indie, obscure songs do not pop up as the top results of search queries, web-scraping for their lyrics is a little annoying/difficult. Keep that in mind if you search something really out there and it doesn’t bring up results. (Also the reason why automatic playlist creation misses some of the songs recommended)
  • Hip-hop as a genre will probably not work well this — I came to this conclusion when a friend put in Tupac’s song — because it contains a lot of slang which the current algorithm is not smart enough to parse and categorize effectively.
  • (Lastly, feel free to message me if you’d like a set of artists featured as a select playlist. I’ll see what I can do.)


Draw your own conclusions for the most part, but I will list out a few of my thoughts after creating this:

What Worked

  • Recommendations were generally the same ‘intellect’ and verbosity level as the user’s selections
  • The sentiment stats were a bit iffy but the basic word stats (what you see on the radar chart) and parts-of-speech (POS) ratios tended to highlight many discriminating qualities of songs (ex. songs that were repetitive with a lot of basic references to personal pronouns were indicated as ‘close to each other’ due to low lexical diversity and high ratios of specific POS features)
  • I thought lexical diversity stood out as a powerful discriminating feature (qualitatively), some PCA analysis or feature reduction in the future may help confirm that.
  • Contrary to what I had thought, the recommendations were well spread out across artists while certain groups of artists (Editors & The National or Bleachers and CHVRCHES) generally came together at the top. This turned out to be a good balance — providing recommendations of various kinds of music while also being accurate enough to note which artists were lexically close to each other.
  • Interestingly, selecting top songs by an artist tended to recommend other top songs of other artists. This is not that surprising — it’s sort of what we would expect after Adorno’s analysis — except that it means even without music, on the lyrics alone, ‘popular’ favorites can be well classified. (It was also cool to note that a lexical analysis alone could classify the ‘top’ songs of artists — its an idea for another project)
  • Part of the beauty of music is subjectivity, and therefore I think this is one of those tools for which everyone has to perform their own qualitative analysis (which is the fun part). Discussing with friends and family, I found some were convinced of some meaning behind their playlist, some not so much but any ‘objective’, ‘mathematical’, quantitative analysis would not sway their vote.
  • Aside from song recommendation through the user’s song, I thought what worked especially well was the custom text option! I tried a couple of different text styles here including: a paragraph from a philosophy book, a Sylvia Plath poem, a foreign poem translated, a random user forum post. All these gave some interesting results and generating playlists from them was, quite frankly, a lot more fun than generating playlists from other songs. I would suggest for you to try it with a poem you really liked recently — you may find the perfect songs to compliment the writing.

What Didn’t Work (and could be improved)

  • I think more samples are needed (which I plan to do in the future) since some of the closest distances were still >3.5 units which could be improved.
  • All the features are weighted equally and that seems to be silly at times — causing certain songs to be inaccurately lumped together due to the closeness of minor, negligible features.
  • There were recommendations which made sense from a lexical point of view but their instrumentals were too wildly different to be a smooth transition (objective #1). This was obviously a possibility as this system is completely agnostic to the instrumentals. On the plus side, I felt this honestly provided lexical links between completely different genres of music which is partially what we were trying to extract. However, a future possibility might be to integrate this with an audio module that performs a similar analysis on audio waves. With some careful weighting and trial and error, we may end up with a recommender that allows for diversity of genre while still keeping the audio and lyrics close to the selection.

Sample Case

You Stopped Making Sense — The Radio Dept

Figure 1.0: Radar Chart of Select Features
Figure 2.0: Line Chart of All Features

“You Stopped Making Sense” is an older song of The Radio Dept. — though I discovered it recently — in the dream-pop shoe-gaze genre. Though it was already in my playlist (slightly cheating), I used it as a test case to find lyrically similar songs that I would enjoy. Firstly, if we were to cluster these, there are certain features that stand out:

  • Less than average total words but average unique words resulting in an above average lexical diversity in the lyrics
  • an overall above average use of verbs and pronouns
  • overall below average use of adjectives
  • a significantly above average use of the ‘existential there’

Generally, I think it’s these features that tie all these songs together. Listening to the recommendations, they’re all sad, soft songs with simple but honest lyrics. I especially like the link with Rejoice by Julien Baker which, like “You Stopped Making Sense”, also has to do with friends and God (though from a different view).(Note that this system is completely agnostic to the actual meanings of words or messages within the lyrics). Tether, “Just a Game”, and “Lucky You” also follow a similar theme of bonds and their (potential) loss with varying takes which makes the playlist quite interesting and vibrant. Lyrically, they’re all from the perspective of someone addressing facts to themselves about another in nostalgic, wistful language. Instrumentally, though some are similar, they hold quite a variety from acoustic finger picking to indie rock to Birdy’s piano to The Radio Dept’s dreamy mix. Ideally, this is the kind of playlist I’m looking to make — with significant lexical links to my original song while still introducing me to different kinds of music across genres.