North Coast Music Festival, a 3 day long music fest with a daily attendance of 20,000

M.A.R.S. — Music Analysis & Recommendation System

Kashish Kohli

Published in

SFU Professional Computer Science

15 min readApr 14, 2019

CMPT 733 Project — Prof. Jiannan Wang, Prof. Steven Bergner — Group Targaryens
By Kashish Kohli, Kanksha Masrani

Motivation and Background

The global music industry today is worth an estimated USD 130 billion dollars. With the rise of streaming services like Spotify, Youtube Music and Amazon Music paired with the cut-down on piracy, the industry is only estimated to grow at a faster rate still. Not only is Music one of the largest established industries in the world, it is also considered an indispensable one.

Analyzing the specifics of this industry can have far-reaching consequences for everyone involved — the listeners who can benefit from improved choice of music that suits their taste more accurately than they could have guessed themselves, the artists who can accurately make music that is guaranteed to succeed and international corporations who know what music will give them the best return on investment.

Problem Statement

The crucial questions this project aims to address are the following:

Can we devise a method of predicting if a song will be popular even before it is released?
What extraneous factors should be kept in mind before creating music-tracks? What are some factors that music might get affected by?
Can we create a model to give us song recommendations based off the user‘s history?

With the above problems in mind, we created our projects with 3 modules.

Music Popularity Predictor
Global Music Sentiment Analyzer
Music Recommender — Popularity based & User Similarity based

Below we’ll explain all the 3 modules one by one.

Data Science Pipeline

As our project has 3 very different modules, we’ll be exhibiting all the pipelines separately.

Music Popularity Predictor Pipeline

2. Global Music Sentiment Analyzer Pipeline

3. Music Recommender Pipeline

Methodology

The music popularity predictor system predicts the popularity of songs based on several attributes of data that are jointly derived from Million Songs Dataset and Spotify.

We firstly construct box-plots of the data to reveal the outliers which are later removed or kept depending on how significantly they affect the data.

Outlier Visualization for each parameter using Boxplots

Heatmap showing correlation between each attribute

Outliers certainly exist in our model and later on we retain some of these parameters while dropping some of them. This is done so our model can be more suited to any kinds of data that is submitted but not prone to over-fitting at the same time. Next we create a correlation map which tells us how parameters influence each other. The relations between these parameters shall be demonstrated later in Results.

This shows that Loudness and Energy are directly related

Next we observe the relations between select combination of attributes since all of them are not of interest to us, We are primarily concerned with Popularity and Valence(quality of the song to provoke emotion). However factors like Loudness, Danceability and Energy also significantly affect the Popularity and Valence. Here we can observe Loudness and Energy are directly related. If we observe the heatmap we can deduce that both of them are also directly related to Popularity hence establishing the relation that a sing needs to have these parameters as high if popularity is under concern.

Field ‘artist_name’ holds a large number of unique values hence we can directly apply a one-hot-encoding process; however, the artist column could have an important influence in the popularity of a song, this means we ought to preserve the column. In order to resolve this issue, we decided that artists with less than 50 songs in the dataset will be grouped under ‘Others’ since that number does not represent a significant quantity for our model to properly learn. After doing so, we applied one-hot-encoding to the column.

We also drop the only non-numeric column, Track name since it does not significantly affect our data in any way. The scaled and transformed dataframe is then sent to the Machine Learning models.

For our machine learning models we define the below models:
KNN Clustering, Support Vector Classifier, Adaptive Boosting, Logistic Regression, Convolutional Neural Network & Random Forest.

These models are trained with the help of GridSearchCV which suggests the optimal parameters. The highest popularity we get for each model is:

KNN Clustering : 62%
Support Vector Classifier : 65%
Adaptive Boosting : 65.5%
Logistic Regression : 66%
Convolutional Neural Network : 67.5%
Random Forest : ~70%

We have also performed Cross Validation analysis and generated Classification report for our models.

Here Category 3 indicates Popular & Category 1 indicates not popular

Global Sentiment Analyzer was built with the aim of understanding the kind of music each country likes. Understanding this can have ramifications for the people who make the music in such countries. For example, as we can clearly see in the below result of the Sentiment Analyzer, American population listens to the most negative/depressive (~50%) songs among all the analyzed countries. Hence music with a sad undertone has a higher chance of succeeding in the American market than in the French market (heavily positive).

Sentiment Analyzing of music around the world

Using RSS Feed Generator, we extract the top 100 iTunes songs from the country of our choice. We’ve chosen Canada, USA, Australia, Brazil, France, Russia, China and South Africa for our analysis. The data has been extracted from the generated JSON file using Requests Python package.

Using Lyrics Extractor which is a PyPI wrapper based on Google Lyrics API (genius.com) which we access by generating a Custom search API Key for every 100 songs for whom lyrics are returned. From the generated lyrics we plot the word-count bar-graph which tells us how many words per song are used on an average.

However this figure is often disrupted due to repetition of words and hence we use a unique word-count plot which tells us the Lexical richness of the song. This is done by removal of stop-words which are imported from NLTK corpus.

The Lexical richness gives us a better idea of the originality of the artist and how original are the artists in each country.

Lexical richness for each song for the same songs (above)

If the country here in question uses languages other than English (France, Russia, China, Brazil etc.), we can use GoogleTrans or GoSlate to convert these lyrics into English

Using these unique lyrics, we then set out to perform a sentiment analysis using SentimentIntensityAnalyzer from nltk.sentiment.vader while employing the vader_lexicon. We create a model using the SentimentIntensityAnalyzer() function which returns the positivity, neutrality and negativity of the content passed to it.
Each song’s lyrics are passed to it individually and the Sentiment vader returns the positive/neutral/negative content percentage of the song.

This percentage is plotted for each song in the form of a stacked bar-plot. As we can see the share of positivity in the songs is more or less greater than the negative portion.

The Positive (top portion), Neutral (Middle portion), Negative (bottom portion) of the songs after the sentiment Analysis

We then find out the share of songs which are positive in these top 100 and compute them. This interesting insight shows the share of songs that are negative in each country’s top 100 and hence helps us see the outlook of things as citizens preview them.

Australian music is clearly much more positive than negative. Computing the percentage of positive songs, we get that Australians listen to 44% happy songs in their top 100 songs and 22% sad songs.

The countries rated by their positive taste in music are:

Recommendation Engines are generally of 3 types. First, Popularity based wherein they suggest the top items of their current itinerary, Second, User Similarity based wherein they make suggestions based off similar users and Item similarity based wherein suggestions are made on the basis of the item attributes. We’ve taken up the 2 most popular of these — Popularity based and User Similarity based, both of which are extensively used in the industry.

We start with 2 separate databases, one with User data and the other with song data. With a left outer join we merge these data sets while removing any redundancy to create one huge database with all the users and their corresponding songs.

Majority of our data is from post 2000 decade as it should be given the dramatic rise in streaming of modern music. Using this data we set out to chart the distribution of data over the years.

Now we calculate the popularity percent of each song by the number of times it has been heard in the database. From this sub-dataframe we visualize the most popular songs. On these songs we aggregate the user_id to see how popular these songs are individually. The new list of songs which is most popular by the number of users who have heard them is taken as the Similarity score for Popularity based Recommender. Below are the results of Popularity based Recommender.

The above table is recommended for all the users as they are the most popular songs in the entire songs. This is akin to having Amazon suggesting the top deals of the day irrespective of the user or Spotify suggesting the top billboard 100 irrespective of the users.

The User Similarity Based Recommender on the other hand gives an intelligent analysis of these songs by comparing the current songs of the user with the entire database of the songs and calculating the Jaccard Index of each combination in the co-occurence matrix.

We start by creating a matrix of the size (personal songs * all songs) filled with zeroes.

We then iterate through all the songs to collect their data and user_ids foar each of them. These user_ids are then compared with each of the user_ids of the personal songs to get an intersection of the users who have heard both. Total users is computed by union of users of current personal song and current song from the entire song list. The length of these lists is divided and stored as values in the co-occurence matrix .

co_matrix[j,i] = float(len(common_users))/float(len(total_users))

The average of each of the columns signifies the correlation between maximum users and we suggest that song as the most appealing to the user’s profile as long as that song does not already belong in the personal song list.

The above methodology is employed by companies like Netflix to generate personalized recommendations as opposed to the general recommendations.

Evaluation

We have evaluated our multi-pronged approach in a sequential fashion.

Starting with Music Popularity Prediction module, we derived the relations between various parameters of music. The relation between components that make up music is crucial to the success of music. Understanding these hence becomes of primal importance if we want that particular songtrack to be popular.

Heatmap showing correlation between all music attributes

Chord Diagram depicting relations between Popularity and other attributes

This heatmap & the chord-diagram shows correlation between all the variables however we basically concern ourselves with Valence (emotion that the song invokes) and Popularity. The correlation of Popularity with all other parameters reveals details which are highly significant.

Several more analysis like the one shown here also have significant outcome on what music attributes should be focused more upon. Early 70s/80s songs used to have a more concentration of valence which has more or less spread out today with skyrocketing popularity indicating that songs earlier were more meaningful and evoked more emotion.

The Machine Learning models that we have made are evaluated according to their accuracy by comparing their results with the test dataset to generate the Classification report and then calculating their accuracy.

KNN Clustering : 62%
Support Vector Classifier : 65%
Adaptive Boosting : 65.5%
Logistic Regression : 66%
Convolutional Neural Network : 66.8%
Random Forest : ~70%

The Confusion Matrix for Random Forest Model

The highest accuracy in these reports was given by Random Forest model which we then chose as the best model with an accuracy of ~70%. The best parameters for the Random forest model was suggested by GridsearchCV:

The parameters combination that would give best accuracy is : 
{'criterion': 'entropy', 'min_samples_leaf': 10, 'min_samples_split': 8, 'n_estimators': 150}

Obtaining the confusion matrix we can see that our model predicts not popular songs correctly in 72% of the cases while it accurately predicts popular songs in 64% of cases.

By supplying the metadata of any song, we can accurately check if that song is popular.

Moving onto the Music Sentiment Analyzer, the results are very evident and indisputable. We can clearly take the billboard top 100 songs of any of these countries and clearly analyze the share of positive and negative songs. The current top 100 songs on Canadian billboard include songs like ‘Who do you love’, ‘Miss me more’, ‘With You’ which are clearly more negative in sentiment than positive.

Moreover if a country is more inclined towards listening to positive songs than negative, it would make sense to go with the flow and create positive songs than negative ones. In case of USA, half the billboard is filled with depressing songs showing what appeals to the populace of USA.

United States of America’s music sentiment (Less Positive ; More negative)

Canada’s music sentiment (More Positive ; Less negative)

We have also generated wordclouds for Canadian positive and negative music to understand the appeal that people have towards these songs.

These results can accurately depict what every country’s taste looks like in music and what they favor. An upcoming artist can clearly benefit from such a streamlined approach and devise music more suitable to his/her country’s palette.

Finally for the Recommendation Engine, the results are very straightforward as well. For the popularity based recommendation engine, the recommendations are uniform irrespective of the user highlighting the top suggestions for everyone. As for the User Similarity based Collaborative filtering model, the results are as shown below.

Based on a user’s history, the song recommendations are as below:

The recommended songs (starting from Position 1(left) to Position 10)

The results are in sync as Joy Division's songs are generally positive and 5/10 songs are recommended from Joy Division itself. Similarly, if we just send a song to get similar songs from our recommendation, we get accurate results as well. If we send the song, ‘Historia Del Portero — Ricardo Arjona’ we get the below songs as an output in the form of a treemap wherein the larger the area of the block, the higher is the recommendation.

Data Product

The Data product that we have are the findings from our analysis which can give a better sense to any upcoming musician on how to secure popularity for their music track.

1.Popularity of a song is highly related to Loudness, Danceability & Energy of the song. The more of the Loudness, Danceability & Energy the better for the track.

2.Popularity of a song drops sharply with increase in Instrumentalness, Liveness, Mode and Speechiness of a song.

3.Valence also is highly related to Loudness, Danceability & Energy

4.Valence drops sharply with increase in Instrumentalness, Duration, Acousticness and Liveness.

5.Energy and Loudness with Acousticness and Instrumentalness present the worst combinations.

6. Depending on the country of music, the Lexical richness and the sentiment of the song should be adjusted accordingly. A lexically rich song might translate to speechiness in USA which is bad for the popularity while it might be appreciated in France.

7. Russian Songs are most lexically rich and hence redundancy in these songs might be frowned upon. The reverse is true for Canadian songs which record a mediocre lexical richness.

8. A graphical analysis of relations between individual parameters shows more interesting relations. For instance, the graph between Valence and Popularity shows how the 1960s songs had a greater spread of Valence and popularity indicating the myriad emotions the songs provoked. Today these songs are pretty meaningless and low on valence while still registering high sales.

Our Recommendation Engines also serves as a data product with advanced User collaborative filtering we can generate an accurate picture of what the user will need and like.

Lessons Learnt

The project had a steep learning curve to it and took us while to figure out the in-depth understanding of several components. The learning curve includes problems like getting data from Spotify which is done using a Python wrapper called Spotipy. The interface and necessary code were challenging.

Secondly, the data from Million Songs Dataset was in H5 format I.E. HDF5 files. Conversion of H5 files into CSV was another challenging task. The ~3GB file conversion into CSV was difficult due to highly structured format of the H5 files which resembled a tree shape. The conversion was also necessary since data cleaning is not possible in H5 format.

We also created Apache Pig scripts to fetch and manipulate scripts. It was easier to fetch data from pig and work with it rather than manually loading the file. The necessary exposure to this new technology also served as a great resource.

Usage of NLTK, GoogleTrans, GoSlate and other NLP based files also posed a challenge as both of us were not familair with NLP. Usage of GoogleTrans, Lyrics_Extractor also posed significant challenges due to the monetary limitation of obtaining more than 100 responses per day.

Unfamiliar models were worked upon like Adaptive Boosting and Support Vector Classifier. Their usage helped strengthen our Machine Learning base. One hot encoding required to convert data into numeric type also took significant efforts.

Summary

Summarizing the project, we aimed to get inferences from our data that would help companies and startups fare better than they already do in the music market. In this regard, we extracted several findings and useful results. This is done by extracting insights from the songs from the last 5 decades. These insights are then used to create a Popularity predictor module which can predict if a song will be famous or not by just inputting the metadata. The accuracy of this module is 70%. Secondly we have created a Sentiment Analyzer which tells you the sentiment of the music heard in major countries. This can help shape the music of an artist to create better songs that would resonate with a wider audience. Thirdly, we have created a Recommendation engine which based on the taste of music of the user can suggest other songs more suited to him or her. It also suggests the top songs that might be on the chart that time as done by Netflix and Hulu.

Future Work

The future work that shall be implemented is below:

In the future, there are several optimizations and improvements that we wish to make on our model. We also wish to be able to fetch the meta data live from the database when a song name is entered and show if the song will be popular or not as well as recommendations.
Addition of more countries to the global sentiment analyzer module to reveal the sentiment of billboard of any country rather than just the 8 we have done now.
Creating another recommendation engine with item collaborative filtering along with user and popularity based.
Creation of a dashboard to optimize viewing the results that we have derived.