Upside Down & Inside Out: How does music convey emotion?

Kira Peck
music-perception-and-cognition
7 min readMay 14, 2022

Emotional Expression via Music Cues

You don’t exactly have to be well-versed in music psychology to suspect that music and emotions are linked. However, the question of why music evokes emotion, and what aspects of music are most key to those processes is one we continue to try and answer. In western musical tradition, we associate minor keys with sad or melancholic feelings, and major modes with joy and triumph, but why in particular is that? What about other emotions? How do we distinguish angry from sad? And what cues arise in other cultures? This study by Grimaud and Eerola sought to answer these questions further.

Many studies have shown that emotions can be conveyed through music, and recognized by listeners. Music can even convey narratives that are recognized across cultural divides. Not only can it convey emotion, it can also impact an individual’s emotional response, which you have likely experienced yourself (perhaps you recognize the tug in your chest when sweet, high strings swell in a romantic or sentimental movie scene, or the jolt of fear cued by classic piercing screeches in a horror movie).

While not the most obvious example, this song perfectly conveys the earnest tenderness and quiet desperation of the scene, and (I’m not ashamed to admit) makes me emotional even when listening outside of the film.

Grimaud and Eerola make a distinction between two emotional processes that can be cued by music:

  • Perceived emotion: the listeners’ perception of what the emotional expression the music intends to convey
  • Felt/induced emotion: the emotional response the listener has to the music.

There is a pretty fine line between the two, and they often overlap and intersect. However, they are considered distinct modes of emotional responses, and results can reflect differences between them. This study focused on how music communicates perceived emotional expressions to the listener.

Created on canva.com, graphics from Sketchify.

Musical cues can also be separated into two distinct categories. There is some flexibility/ambiguity between these categories (dynamics, for instance, can be manipulated by composers and performers), but this paper’s specific definitions are highlighted below.

Structural cues:

Properties of the music that relate to the score (tempo, mode, other notated/structural aspects of the piece)

  • In this study: tempo, mode, pitch, dynamics

Expressive cues:

Musical features that performers use (articulation, timbre, etc.)

  • In this study: articulation, brightness

(These musical cues are defined below.)

There have been a number of studies in the past that aim to understand what aspects of music influence perceived emotion, but as always, there are significant limitations to these past studies. Some tried to look at structural and expressive cues separately, but the two work together and interact in key ways, so Grimaud and Eerola investigated them simultaneously. Other studies tend to focus on very basic emotions (sadness, happiness, anger, fear, etc.), which doesn’t test as much nuance as people experience. This study wanted to explore more complex emotions that have been perceived in music, as well as the basic ones, for a total of 9: joy, sadness, calmness, anger, fear, surprise, love, longing, power.

Image source: Pixar’s Inside Out, via TechCrunch

Previous studies looked at particular musical cues (like tempo, volume, and pitch) in isolation or small groups, or tested all cues, but only had limited or arbitrary levels for each cue. Grimaud and Eerola approached this by having participants manipulate the stimuli to represent given emotions, and then having another set of participants evaluate the manipulated tracks.

Rather than following suit and using existing music as stimuli, which introduces a familiarity bias, Grimaud and Eerola had original stimuli composed. They also wanted flexible, polyphonic stimuli that participants could manipulate in many ways (not just melodic).

First, they had composers create 3 pieces of short, polyphonic piano music for each emotion (28 total). In Experiment 1, they had participants complete an online study, where the 28 stimuli were played in a random order. The participants then rated each stimuli on emotional scales, for how strongly they thought the piece was expressing each emotion (1 being not at all, and 5 being a lot). The results of that experiment are below:

This chart shows the mean ratings of emotions perceived by participants within the piece’s intended emotion category. The excerpts row indicates pieces from each category that were rated highest for their intended emotion (when the participants “got it right”).

Consistency and agreement between participant evaluations was analyzed, and mean scores calculated. Overall, calmness, fear, joy, power, sadness, and surprise were rated highest for the intended emotions, but the anger, longing, and love excerpts were rated higher for other emotions (possibly because love, longing, and calmness share many musical features, like soft dynamics and slower tempos). Of the 28 pieces, 16 were correctly identified as their intended emotion, suggesting that listeners can correctly identify the intended emotion of a novel/unfamiliar musical piece. The excerpts that were rated most accurately for each emotion (out of the three for each) were selected to be used in the next experiment.

Below is the highest-rated excerpt for joy:

And, for reference, here’s the highest-rated excerpt for fear:

You can listen to all of the original excerpts here!

In Experiment 2, a new set of participants were given the stimuli selected in Experiment 1, and instructed to manipulate the stimuli via computer interface to change the emotion conveyed by the music. The participants were from a range of musical backgrounds, but the majority of them were not musically trained. The participants were split into groups, and each had to adjust 7 pieces to reflect 3 emotions (21 total per participant). Below are the structural and expressive musical cues the participants could manipulate, what manipulations they could make, and common choices participants made with these manipulations:

Tempo: Participants could adjust a slider on a scale from 50 bpm (beats per measure) to 160 bpm

  • Participants used slower tempos to portray calmness and sadness, power and fear had moderately fast tempos, joy and anger had faster tempos, and surprise was the fastest.

Articulation: Via another slider, participants shifted the articulation cue between legato (longest note-duration), detaché, and staccato (shortest note duration)

  • Legato: sadness and calmness, detaché: fear and power, staccato: anger, surprise, joy

Pitch: A slider allowed participants to shift the track ± 2 semitones away from the initial pitch

  • Fear, surprise, and joy had very similar pitch values. Power had the highest pitch, closely followed by anger. Lower pitches were used for sadness and calmness.

Dynamics: Slider adjusts the MIDI instrument’s output volume (but not overall volume)

  • Participants generally chose softer dynamics as a whole, the softest of which were anger and sadness, but surprise and joy were loud.

Brightness: This slider manipulates how many harmonics sound (smaller value = fewer high frequencies pass through = darker sound)

  • The brightest timbres were surprise and joy, and the darkest was sadness

Mode: A toggle button allowed participants to switch the stimuli from major and minor mode (by flattening the third and sixth degrees of the scale)

  • Major: power, calmness, joy, surprise (although participants were less certain of surprise and power)
  • Minor: sadness, fear, anger

For an example of modal manipulation, here’s a clip of “YMCA,” but manipulated to be in minor mode:

Warning: it’s quite strange!

On a less off-putting but still strange note, here’s R.E.M.’s “Losing My Religion,” switched from minor to major mode:

Of course the original is best, but I’m not mad at this version, it’s kind of fun!

For Experiment 3, a new group of participants were given 14 stimuli (The 7 selected from Experiment 1/used in Experiment 2, and 7 that had been manipulated by participants from Experiment 2), and asked to rate how effectively they portrayed each emotion. The 7 pieces from Experiment 1 were once again rated highest for intended emotions, and some pieces from Experiment 2 (calmness, joy, fear, sadness) were also successful. By comparing participants’ assessments of the stimuli groups, they were able to get a better picture of what specific cue values and combinations indicate particular emotions.

Here are the cue characteristics that appeared in the successfully-identified stimuli from both groups (the original compositions and those influenced by Experiment 2 participants):

  • Calmness: Major mode, slow tempo, legato articulation
  • Fear: Minor mode, moderately fast tempo (contrasting articulation and pitch)
  • Joy: Major mode, fast tempo, staccato articulation
  • Sadness: Minor mode, slow tempo, legato articulation, lower pitch

[This short clip shows the difference between legato and staccato, for reference.]

Stimuli from Experiment 1 was generally rated more accurately than Experiment 2 stimuli, although the use of piano in 1 and strings in 2 could have had an impact on participant perception of the stimuli (particularly fear, since the Experiment 2 stimuli was rated higher, and strings can be associated with horror movies). Similarly, the initial composer of Experiment 1 stimuli had more cues (and musical background) at their disposal than the Experiment 2 participants, which could be why those stimuli are rated higher.

Either way, the study showed that despite varying musical skills and a range of emotions and cue possibilities, cue combinations are still used in the same way to convey specific emotions, and those emotions can be detected by audiences. Moreover, this study showed that not only can people recognize basic emotions, but more complex/nuanced ones as well (the most successfully identified stimuli was in fact calmness). Musical cues can be utilized to portray emotion to audiences regardless of musical background, or previous familiarity with the songs.

Sources

Juslin, P. N. (1997a). Emotional communication in music performance: A functionalist perspective and some data. Music Perception, 14(4), 383–418. https://doi.org/10.2307/40285731

McAuley, J. D., Wong, P. C. M., Mamidipaka, A., Phillips, N., & Margulis, E. H. (2021). Do you hear what I hear? perceived narrative constitutes a semantic dimension for music. Cognition, 212, 104712. https://doi.org/10.1016/j.cognition.2021.104712

Micallef Grimaud, A., & Eerola, T. (2022). An Interactive Approach to Emotional Expression Through Musical Cues. Music & Science. https://doi.org/10.1177/20592043211061745

--

--