Content-Based Music Recommendation System

Dibyendu
5 min readMay 17, 2020

--

Descriptor Generation

Why content-based music recommendation system?

Current state-of-the-art music recommender systems use user-generated metadata, such as previous purchases and listening history, as the basis for the recommendations. However, such metadata-based systems cannot recommend artists or songs for which there is no data available (i.e., new songs or artists). This ”cold start” problem has made researchers focus on improving content-based recommender systems, which use audio as well as lyrical features extracted automatically from the song content as the basis for recommendations. So we aim to build a music recommendation system based on music content to avoid the cold-start problem.

A solution to the cold-start problem

Cold start problem arises when some new genres or songs come then the application won’t able to suggest them. To tackle this problem research shifted to content-based music recommendation. The systems based on this approach find out underlying features present in the music such as MFCC, ZCR, Spectral coefficients, etc. and use them for the recommendation. Since a new song can have music content similar to an already popular song, this new song has a good probability of being recommended. Now the focus shifts to whether to use low-level features such as MFCC, ZCR, Spectral coefficients, etc. or high level such as genre, mood, instrument, lyrics etc. Many researchers have done comparative studies on this topic. There is sufficient research supporting that using high-level descriptors gives good results for the recommendation.

Dataset Description

We have worked on two different datasets.

Dataset 1: we are using publicly available free Music Archive(FMA)[4] dataset. Although a large number of the dataset are available in the web FMA[4] is best because it contains meta-data(genre, artist, tracks) and a large number of good quality songs. Metadata is used to create a high-level feature. n this dataset all songs are grouped according to genres. Four sets of FMA[4] are dataset is present. We have worked on the small dataset which is consisting of 8 gen-res with 1000 songs in each genre. Total songs in this set are 8000 of the 30s.

Dataset 2: we used freely available million song dataset[1] having meta-data(lyrics, genre, artist, tracks approximately 55 fields[1]) of all the songs. We worked on a 1.5 GB volume of this dataset. 20585 English songs are taken by us each song has associated lyrics. And all songs are grouped into 7 genres(Rap, Pop Rock, Country, RnB, Latin, Electronic and Religious). The number of songs in each genre is imbalanced.

Proposed Solution

We combined two models at different weights to generate our final model-

Phase 1:

A feature genera-tor name Librosa[2] is used. It generated 7 feature descriptors- mean, std, skew, kurtosis, median, min, max. Every feature has 20 dimensions. So we generated 140 features per music. These features are also known as Mel-frequency cep-strum(MFCC). We used bagged random forest to convert those low-level MFCC to higher level genre features vector.

General scheme of the proposed Model

Phase 2:

we have extracted features of lyrics by taking the top 5000 words and made its Tf-IDF vector[3]. Sowe trained our classifier (Bagging with base model random Forest) on audio and lyrics separately to convert those features to higher level genre features vector. After that, we applied the weighted average technique to lyrics and accordingly (1-x) to audio, combined both results to get the genre and then returned maximum similarity songs of a given song using those higher-level feature space based on cosine similarity.

General scheme of the proposed Model

Combination of two model

We have used three classifier- SVM, Neural Network and Bagged Random Forest. I have got accuracy with Bagged Random Forest, around 72%.

Comparison between different models

We have used different weights to combine both parts and calculate the genre predicting accuracy. The accuracy achieved using the above approach is 71%. We have got the best accuracy at weight 0.5. We need to check this accuracy to generate a better higher-dimensional feature space.

Weight vs Accuracy

Finally, we have combined both higher-level feature space by giving equal weight to each model. Similar songs are present close to each other in genre space and hence are recommended using the k-NN approach.

Evaluation

Evaluation of the performance of the recommender is done using Precision at K approach.

Precision @K = number of correct recommendations within top-k / K

Value of Precision at 25 for our model averages around 14.5.

References:

[1]Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, Paul Lamere,” THE MILLIONSONG DATASET”, 2013.

[2] Brian McFee, Colin Raffel, Dawen Liang, Daniel P.W. Ellis, Matt McVicar, Eric Batten-berg, Oriol Nieto.librosa: “Audio and Music SignalAnalysis in Python”. In Proc. of the 14th Python inScience Conf, (Scipy 2015).

[3] Teh Chao Ying, Shyamala Doraisamyand Lili Nurliyana Abdullah, “Lyrics-Based GenreClassification Using Variant Tf-IDF WeightingSchemes”. (2014).

[4] Micha ̈el Defferrardy, Kirell Benziy, PierreVandergheynsty, Xavier Bresson. “FMA: A dataset for music analysis. In Computer Music J.”,(2004).

Contributions of the team members

Dibyendu Roy Chaudhuri, MT19034: Conceived the idea, Built Lyrics based model from Million Song Dataset, Built Audio Based model from Million Song dataset, Report and documentation.

Saif Ahmad Khan, MT19135: Conceived the idea, Built Audio Based Model from FMA dataset, Combined audio and lyrics based models to build combined model and recommend songs using KNN, report and documentation.

Sameena Firdos, MT19136: Conceived the idea, Built Lyrics Based model from FMA dataset, extracted features from audio files using Librosa tool, report and documentation.

--

--