“I believe if there’s any kind of God it wouldn’t be in any of us, not you or me but just this little space in between. If there’s any kind of magic in this world it must be in the attempt of understanding someone sharing something.”
(R. Linklater, Before Sunrise)
This is one of my favorite quotes from any movie. To interpret this without the religious context, I see it as a way to establish the meaning of art. If art is ever to exist, it would not lie within us but in the spaces between what exists. I am fascinated not only by the artistic works but also by the beautiful connection it creates with people, things, and concepts. In light of the development of AI and my interest in music and media art, I want to explore the current technology that offers connections between AI and music, color, and emotion. Further research on this topic could provide further insights into the relationship between the different mediums of arts through the lens of AI, and potentially be a part of the future AI visual incorporated into music.
With the current technology available, I looked into a machine-learning model developed by a researcher named Theodoros Giannakopoulos. This ML model maps audio to emotional classes. It uses Spotify (Stotify’s API) for mood detection and plots the emotion on a valence and energy chart. On the chart, valence is on the x-axis and energy is on the y-axis. Valence refers to how positive or negative the emotion is and energy refers to the energy level of the song. The values are on a continuous -1 to 1 range. This machine-learning model has been trained with over 5000 songs. For each time it analyzes on time, the model takes a 5-second segment of a song. The audio signals are processed in 8K, 16K, and 32K. The model uses PyAudioAnalysis to extract 130 statistical audio features per segment into 130 dimension feature vectors.
However, as stated by the research itself, this emotion classification is only 70% accurate while the valence is only around 55% accurate. Additionally, there isn’t deeper research on the psychological factor of the color representation of the emotion. Therefore, I would like to test the model’s accuracy in emotion detection and color representation.
There has been previous research on the relationship between the association with different musical genres. Research K. L. Whiteford has interviewed an extensive amount of people and formed a color palette of different colors that people associated with the respective genre. She has also created a valence arousal chart for different genres of music.
However, I want to take this concept further by doing this experiment on people while also having a comparison to the results that AI produces.
For the experiment, I first created a document questionnaire to send people online (sample questionnaire: https://bit.ly/3JRJJLc). In this questionnaire, there are 30-second samples of 4 different songs to be listened to from 4 different genres. The listeners much first plot the emotion of the songs on the valence-chart chart. The points in which the listeners plotted would be compared in vector difference with the AI’s average coordinate given the same sample. The vector difference will be calculated using the equation for vector difference.
Then, the listener will be asked to give the emotion pinned a color. This color will be compared to the color produced by the AI at its average point.
The color difference will be calculated by comparison in RGB difference using the delta E 94 equation.
In this questionnaire, there are introductions and instructions on the questionnaire to make sure the participants are well aware of why I’m conducting my research, how the valence-arousal chart works, and how to plot points and use colors. There are also advice and instructions on how to customize the colors more uniquely to achieve less bias on the color. Additionally, there the whole questionnaire is in black and white and the song links are given with no visuals. This is done to avoid color and visual bias when the participants are listening to the music, generating a more emotional response with the connection to color.
The four songs that are used for my experiments are of different 4 different genres with a relative difference in valence level and arousal level. The first song is a lofi/jazz song named Neopolitin by Guustavv. The song has a slow tempo with a looped beat. The second song is an alternative rock/metal song by the 1975. The third song is a future funk song by the artist Night Tempo. The fourth song is an indie folk song named by Pheobe Bridgers.
Throughout the four songs, the average vector distance was 0.69. Given the maximum vector distance of 2.83, the accuracy of the AI’s emotion detection is around 77%. This is quite impressive for the AI.
However, when we look closely at the results of the AI for each song, we can detect its weakness. The AI has difficulty detecting the ‘relaxed’ emotion– high valence but low arousal. In the first song where most people put their answer in the bottom right quadrant, the AI puts it in the bottom left. This shows that AI easily confuses low arousal as automatically being low low valence. Additionally, AI also has a lower tendency than humans in choosing high-energy emotions.
The human results of emotion detection were also ambiguous depending on the song. The results from the alternative punk/metal song, where it would sound were ambiguous as 4 put them in the high energy-low valence box while 3 other candidates put it in the high energy-high valence.
Taking a closer look at each color the participants have chosen, we can see interesting patterns and disparities.
The 2nd song, which conventionally would sound rather aggressive, has a more unanimous choice of deeper richer red-ish colors. This shows that high-energy aggression is more likely to be associated with darker redder colors. Then on the 3rd song, many participants picked a bright pink color. This could be associated with the genre of future funk which incorporates much of pink visual themes. Then on the 4th song, there’s an uncanny resemblance between the color palette of the album cover and the colors the participants have picked. Due to this song being quite famous, the audience could’ve already been familiarised with the artwork of the album and subconsciously picked the colors that were related. Therefore, if an AI model was to be developed in translating emotion detection to visuals, it is important to incorporate the online data of the color of the associated artwork and visuregardingd to the genre.
The emotion color result of using the Delta E (94) equation shows that the color difference that the AI generated compared to the human picked shows the slightest difference in the 4th song and the most difference in the 2nd song.
To summarize this experiment, was a good introduction to the investigation of the relationship between emotion, color, music, and AI. However, for further improvement, I would need to ask more candidates and have a wider range of music. Next time, I should survey people further on who they are based on where they are from, the genres they listen to, and how they feel that day. This then could help me detect further patterns from the data to the emotion and color response. Then I would like to input the data on songs, people’s perception of the songs, and further personal information to create a visual performance.