Categorization: Textual Features to predict genre of a song based on lyrics

karanjude
MLFeatureEngineering
1 min readJan 7, 2018

Detailed Problem Statement: Assuming you had a collections of lyrics belonging to different genres. You wanted to build a supervised learnt model that given some lyrics could classify what genre the song might belong to. What kind of features would you use to build the model.

Textual Feature Pre processing

  • Tokenize the words in the lyrics
  • Remove stop words from the lyrics
  • Apply Stemming to the words
  • Create paragraphs of sentences in the lyrics

Textual Feature

  • Apply Word2Vec on paragraphs of lyrics
  • Generate an embedding averaged over all the paragraph vectors, divided by the number of vectors.

A feature like this can be fed into a model like a neural network / logistic regression to perform multi-class classification.

--

--