Building a Facial Expression Music Recommender

SFHS ML Club
Visionary Hub
Published in
7 min readJun 7, 2022

Models Developed in Python using VGG16, Tensorflow, Keras, ScikitLearn

Written by: Fazal Mittu, Ruhi Yusuf
Project Team: Fazal Mittu, Ruhi Yusuf, Koena Jaware, Sandeep Bajamahal, Paree Merchant, Siddhant Hullur

Introduction

Whether it’s the beat of a drum or a melodic voice, music can transport us to a new world where we can express our creativity and drown out the world around us. Music is all around us and plays can be found in party settings, religious functions, or cultural celebrations.

With over 70 million songs on Spotify, there is a song for every party, holiday, or even mood. The issue is that with so many songs, it can be hard to find the perfect song for every occasion.

Introducing the Facial Expression Music Recommender — a system that analyzes a person’s facial expression and recommends them a song based on their current mood.

Quick Overview

This project has 3 main parts: facial expression detection, music mood classification, and putting it all together for a seamless user experience.

Facial Expression Detection: the model should first identify a face in the image, then learn to classify the expression into sad, happy, calm, or energetic.

Music Mood Classification: the model should use song characteristics such as tone or valence to determine what emotion a song represents.

UI: the user interface is made with flask/html and allows the user to input their image and a playlist if they choose. The app should then present the user with the detected emotion and a randomly selected song that corresponds to that emotion.

Facial Expression Model

A CNN (Convolutional Neural Network) is a neural network that involves image processing and image detection. A CNN contains the input layer, convolutional layers, dense layers, and the output layer. The convolutional layers extract features from the image which helps in determining the specific expression.

To properly detect emotions, we first needed to detect a face within an image and then run a model to detect the facial expression.

We ended up using the haarcascade frontal default model to detect the faces themselves. We tested different haarcascade models for different parts of the face but concluded that the frontal default version of the model was the most accurate and best applicable for our project’s purpose. The haarcascade model prints out 4 coordinates of the bounding box that surrounds the face that it detects. We discovered that when using the haarcascade model, the face in the picture must be facing the camera, should not be tilted, and that all the parts of the face must be visible.

Using a custom model with custom parameters did not provide good results. While the accuracy was high in the 90% percentile, the accuracy in the testing dataset was around 55%.

The first model predicted false positives for fear, classifying happy faces as ones filled with fear. Additionally, there were few false positives for the emotion “sad.”

Next, we tried the VGG16 model. This model worked with a 61% accuracy in the testing dataset and was better than the previous custom model. However a flaw in the model, as depicted by the confusion matrix below is that quite a few sad pictures were classified as neutral. While the VGG16 model was better in classifying very diverse and expression emotions such as happiness and anger, it did not do so great at classifying more subtle emotions such as neutral or sad. The VGG16 model also took up a lot of space.

The final solution was to use a VGG16 model with custom parameters set by Ashadullah Shawon, which was more successful than our previous model in predicting the more subtle emotions such as Neutral and Sad. Our final model still slightly confuses sad/neutral faces but it provided us our best result and is being used in production.

Music Mood Classifier

The larger focus of this project was being able to classify music into different emotions. We needed to train a model that could take different features of a song (tone, danceability, acousticness, energy, tempo, etc.) and use them to determine if a song was Happy, Sad, Calm, or Energetic. To start with, we found playlists on premade Spotify playlists that each contained a different type of music. For happy music, we used mainly pop songs. For energetic songs, we used pop as well as EDM. For sad music, we used more mellow songs. For calm songs, we used LoFi.

We used 300 songs per genre to create a total dataset of 1200 songs. We used one hot encoding to give each song a label to use as training data for the model.

Now with 2 dataframes containing song links and labels respectively, it was time to actually get the song data we needed to make the classifications. We used the Spotify API and Spotipy python library to extract song information and store it in our datasets. In the end, we were left with a dataframe containing the following attributes of each song: [‘danceability’, ‘acousticness’, ‘energy’, ‘instrumentalness’, ‘liveness’, ‘valence’, ‘loudness’, ‘speechiness’, ‘tempo’].

The next step is training our classifier for which we used tensorflow and keras. To make the pandas dataframes compatible with the model, we had to convert them to numpy models. After this, we used sklearn to split our dataset(song attributes/labels) with a 66/33 split.

Our final model was created with keras and contained a single dense layer with the relu activation function; it was trained using the scikit learn keras classifier function.

In the end, our model finished with a 74% training accuracy and 70% test accuracy.

There were many challenges that we faced in developing the music emotion classifier. For starters, obtaining the song data proved to be a challenge as we received many errors with the Spotipy library/API. When we finally got it to work, we tried several different datasets (song playlists) but it was difficult to find one playlist that had songs from each emotion. Furthermore, it would be difficult to individually label each song. To amend this issue, we decided to instead find playlists for each different emotion and concatenate them all into one large dataset. This made labeling easier and also provided us with more accurate songs that fit better in their respective emotion.

With so many songs, keeping track of the data became a large problem. One of the biggest errors we faced was the fact that the spotify API was unable to retrieve information for every single song in the dataset. We had to essentially go through the playlist song by song to determine which song was giving us an error and then manually remove it.

To make data analysis/training easier, we split all the code for the music classifier into 2 separate notebooks: one for data and one for training.

In our data notebook, we used pandas to help us visualize our dataframes and spent a lot of time ensuring all our data sizes were consistent, all the songs worked with the API, etc. We labeled all of our data using one hot encoding and exported the dataframes as .csv files which could be saved for later use.

Using the .csv files from the first notebook, our second notebook dealt with importing the data, preprocessing/splitting, model creation, training, and testing. We also exported the final model as a .pkl file so that we could save hyperparameters and use it for future predictions.

App/User Interface

Our final design can be seen above where the user is prompted to input an image to begin the classification/song recommendation. Once the user selects a file and clicks the submit button, a function is called that detects and returns a specific emotion.

For convenience, the app is preloaded with a default playlist that has already been entirely classified by the music mood classifier. A function takes the detected emotion as a parameter and finds a corresponding song that fits that emotion. The app then shows the user a Spotify embedding of the song along with the picture that they inputted.

Conclusion/Takeaways

One key takeaway from the project was learning how to use multiple models in tandem to create a seamless system that provided an accurate output. With such complexity came the need for good planning and an organized code base which was crucial to the success of the project. Halfway through it became apparent that our organization was poor and more structure was necessary.

Code

Link to Webapp

http://fermusicrecommender.herokuapp.com/

--

--

SFHS ML Club
Visionary Hub

Welcome to the Saint Francis High School Machine Learning Club Medium page. This profile is used to document the projects we do together as a team.