EMOTION-BASED MUSIC RECOMMENDATION SYSTEM USING A DEEP REINFORCEMENT LEARNING APPROACH
Do you know that sounds and music communicate with our emotional brain? Amazing, isn’t it?
Several types of research, indeed, presented that listening to sounds and music has a significant role and effect on our feelings and emotions [1] [2]. As for our emotions, they play a important role in recommendation[3].
Most existing recommendation systems are based on ratings provided by users. Well, there is a catch! Those systems only allow you to live a static user-experience as the system will give recommendations based on the ratings history without regard to other parameters that might impact the prediction such as the user reaction, behavior, feeling or emotion[4].
The trick is that by considering those behavioral and emotional features, the user could then benefit from a more dynamic, customized and tailored experience. And it gets better, emotions are very interesting implicit indicators, and one way to exploit them is to use them in the reward system of a deep reinforcement learning approach.
An affective recommender system
Have you ever imagined a recommender system that combines both music and emotion? A system that can propose a custom playlist according to your feeling and emotion and which can adapt itself to your behavior based on your past experience? That curious question prompted me to develop an emotion-based music recommender system project.
This project is a Deep Reinforcement Learning-based project which uses the music emotions and genre added to the user information (mainly his age, mood, and gender) as state features, and a reward system based on the user’s feeling, namely his stress level detected from his Heart Rate Variability (HRV). It aims to recommend the most suitable and tailored top k songs to the user from a given playlist, based on previous recordings of his feelings.
Here’s how it works:
Suppose we have previously observed the reactions of several users through their HRVs each time they listened to a given playlist, as well as the emotions they felt when listening to each song. We then have historical observation and feedback data which can be used to feed a reinforcement learning environment such that:
- The state is composed of the music features, mainly the genre and the emotions, and the user information namely his age, gender, and his mood measured on a scale of 1 to 5 such that 1 denotes sadness and 5 denotes happiness,
- The reward is the user’s reaction after listening to a particular song (whether it made him feel calmer or more stressed), mainly his stress level identified from his HRV.
Processing and extracting music genre and emotion features
As the user will be inputting raw audio files, it is necessary to identify the genre and the emotion features before they can be used as state features. To do so, I built up a model for analyzing musical genres and emotions, a Neural Network that fits a multi-label classification problem. This model can identify a set of target emotions (tenderness, calmness, power, joy, sadness, tension) that a user might feel, and is capable of identifying classical, pop, rock, and electronic music genres.
The audio files were processed using librosa, a python module for audio and music processing: high-frequency signals were boosted using the pre-emphasis approach before the timbral, and the tempogram features were extracted to feed the classifier model.
Identifying stress level from Heart Rate Variability (HRV)
You must be wondering what a Heart Rate Variability is. Let me clarify!
HRV is a measure of the variation in time between each heartbeat, commonly used to assess what is going on with a person’s body, eventually his productivity, energy, and stress level. You can measure HRV with any wearable devices or any heart monitoring applications, and you can easily export the data into a csv file [5].
You are up to date! Let us keep going.
In support of the DRL’s reward system, I built an HRV-based stress level analyzer. It is a CatBoost model, capable of detecting high, medium, and low-stress levels using HRV. The following metrics were used to predict a user’s stress level: his heart rate, 4 time-domain metrics (SDNN, Mean-RR, pNN50, and RMSSD), and 3 frequency-domain metrics (Very Low Frequency, Low Frequency, and High Frequency). These are the most commonly used metrics in HRV measurement.
Here is the bottom line!
I know I can almost hear you thinking: “What does the stress level analyzer have to do with the reward system of the model?”. Stick with me, I’ll explain it.
The reward system of the DRL model is based on the feedback given by the user, here measured from his stress level. Thus, a high level of stress, interpreted as a sign of dissatisfaction, will be converted into a reward of a null gain for the model. On the contrary, a more reasonable or relatively low level of stress is interpreted as a sign of satisfaction and will be converted into a positive reward for the model.
To build my DRL-based system, I came up with an approach based on the PyTorch implementation of the A2C (Advantage Actor-Critic) model, trained on a custom environment developed using the gym library, an OpenAI toolkit specially used in reinforcement algorithm development. The DRL model scored 0.80 , run through 10000 learning steps.
Retrieving Feedback from the user
To keep the system dynamic, I set up a database that will serve as a reply buffer, to save newly provided feedback and observations. To improve the quality of future recommendations and to better adapt the system to the user’s behavior, the model can be retrained later in the offline production environment using these feedback data.
And what about the dataset in all this?
I used the following two different data sources to develop the project:
1- Dataset on Induced Musical Emotion from Game with a Purpose Emotify, available here. This dataset consists of 400 song excerpts labeled with their respective genre and the annotation of the emotion that the participants felt strongly when listening to each song [6].
2- Biometrics for stress monitoring kaggle dataset, available here [7]. This dataset contains Heart Rate Variability (HRV) and Electrodermal activity (EDA) data computed from the SWELL [8][9] and the WESAD datasets [10].
Now, demo time !
The system has been deployed using streamlit framework and its built-in sharing feature. To interact with the system:
1- Input a playlist of your choice and you information (age, mood, and gender),
2- The system will suggest the most suitable top 2 songs from your playlist,
As it can be seen in the above image, the system suggested songs that share common characteristics: they both evoke tenderness, calmness and sadness with high confidence.
3- Listen to the songs,
4- Upload your HRV metrics for feedback analysis,
5- Save your feedback.
Simple as that! 👌
Acknowledgements
I am grateful to the Africa 2020 Data Science Intensive (DSI) Program for the wonderful opportunity to develop this project. Heartfelt thanks to the DSI tutors for their precious support during the realization of this project.
References
[2] Shafron, Gavin. (2010). The Science and Psychology Behind Music and Emotion.
[5] What Is Heart Rate Variability (HRV) & Why Does It Matter?
[6] A. Aljanaki, F. Wiering, R. C. Veltkamp. Studying emotion induced by music through a crowdsourcing game. Information Processing & Management, 2015.
[7] S. Koldijk, M. A. Neerincx, and W. Kraaij, “Detecting Work Stress in Offices by Combining Unobtrusive Sensors,” IEEE Trans. Affect. Comput., vol. 9, no. 2, pp. 227–239, 2018.
[8] S. Koldijk, M. Sappelli, S. Verberne, M. A. Neerincx, and W. Kraaij, “The SWELL Knowledge Work Dataset for Stress and User Modeling Research,” Proc. 16th Int. Conf. Multimodal Interact. — ICMI ’14, pp. 291–298, 2014.
[9] Kraaij, Prof.dr.ir. W. (Radboud University & TNO); Koldijk, MSc. S. (TNO & Radboud University); Sappelli, MSc M. (TNO & Radboud University) (2014): The SWELL Knowledge Work Dataset for Stress and User Modeling Research. DANS. https://doi.org/10.17026/dans-x55-69zp
[10] Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., & Van Laerhoven, K. (2018). Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. Proceedings of the 2018 on International Conference on Multimodal Interaction — ICMI ’18, 400–408. https://doi.org/10.1145/3242969.3242985