Sounding Emotional: How Timbre Choices Affect Emotion in Music

The Sound of AI
Published in
9 min readFeb 21, 2019


by Christian Tronhjem

A spectogram made from cats.

From the moment you listen to music, you encounter an unwavering wave of emotion. Whether it’s a gut-wrenching movie or video game sequence, an annoyingly-memorable advertising jingle, or a blast from your childhood past, you felt (and probably still do feel) a certain way when hearing it. But what would happen if the instrument sounds of your favourite hits were suddenly swapped? It’s widely (yet informally) known that major keys make happier melodies than minor keys (Jingle Bells in minor might sound like there’s nothing in your stocking this year). Adjusting the song’s scales or performance can shift the mood, depending on the extent of the change. But there’s something about the sound of a low legato cello that speaks more to sadness, than a high jumpy marimba melody. To understand how sounds stir up emotions, let’s examine what they’re built upon.

The truth is in the timbre

We call our subjective understanding of a sound’s frequency spectrum timbre. Think of timbre as the sonic fingerprint, or quality of the sound. It’s what makes you recognise a piccolo over a piano, regardless of the note being played.

Each note played on an instrument has an identifiable pitch or tone. This is the fundamental frequency that is the central or most salient pitch of the note. On top of this ‘fundamental’ are a series of harmonics or ‘partials’ in a sequence. Take the note ‘A’, tuned to a frequency of 440Hz. The first partial would be the octave above, 440Hz times two, which is 880Hz. The next partial would be the fundamental times three, which is actually an interval of a ‘fifth’, in musical terms. The next partial is then the fundamental times four, two octaves above, and so on. What makes up the characteristic timbre of an instrument is the strength of the individual partials in relation to each other. Many different factors shape the strength of the partials, such as the physical material the instrument is made from, how that affects a resonating column of air, particular string vibration characteristics, or the loudness at which an instrument is played.

Sound frequencies also develop over time. Plucking a string with a plastic guitar pick creates a short burst of noise, containing harmonic and inharmonic partials, which quickly fades out (within the first 30 ms), leaving room for the pure partials of the vibrating string. These also fade over time, from the highest frequency downwards. This whole ‘envelope’ is what gives a guitar string the initial ‘plucky’ attack, and why the sound softens over time; as its volume fades, so does its frequency spectrum. You’ll see this in the spectrogram below, generated from a guitar string. A spectrogram shows the distribution of a sound’s frequency spectrum over time. Frequency is mapped on the vertical axis, and time in seconds on the horizontal. The brighter the color, the higher the energy or amplitude of the frequency.

A spectogram showing the frequency of that sweet guitar pluck.

However, this isn’t the case with all instruments. Wind instruments, for example, don’t fade in timbre and volume over time in the same way, since they constantly need air blown through them to produce sound.

Non-tonal sounds have much more complex timbre, but follow the same composition principles. We get vast amounts of information daily from simple object sounds, and can often derive the material, weight and/or size of an item just by listening. From experience, we’ve learned to recognise a pencil hitting a wooden floor, or when a finger taps an empty plastic bottle — by it’s hollow roundedness. When describing timbres, we compare the sound’s characteristics against objects we’ve previously encountered in the world. Describing something as thin could refer to a sound ‘not being full’, or lacking a body that resonates with lower frequencies. Real world bass frequencies often come from ‘big’ materials, or larger resonating chambers. ‘Thin’ therefore refers to a smaller object, rather than a larger one with a ‘bassy’, ‘deep’ or ‘full’ sound.

The first set of descriptors for different timbre groups was introduced by Hermann von Helmholtz back in 1877 (see the figure below, from Howard and Tyrell). We can see that different examples of grouped acoustic instruments are tied to specific descriptors, and how some of the descriptors are linked to an emotion.

Helmholtz’s first set of timbre descriptors.

Striations are frequencies above the 7th harmonic, where the spacing between the individual partials becomes harder for our ears to perceive. They’re therefore perceived as noisy or harsher if they dominate.

The human ear is very sensitive to transients; the ‘attack’ of a sound. The ability to hear rapid, sudden changes might’ve granted us an evolutionary edge, as they’re often associated with danger or a need to respond. Smaller or ‘thinner’ sounds might appear less threatening, because usually smaller things and animals aren’t as dangerous (unless it’s a Golden Poison Dart Frog), whereas a large bass-heavy object might be reminiscent of an incoming avalanche, which could be lethal. ‘Pure’ timbre tones, where ‘even harmonics’ dominate, provide a more round tone. For most people this might indicate friendliness, warmth or happiness, whereas odd harmonics with noisy content might be harsher on our ears and thus appear less friendly. Indeed, many of the above descriptors do somewhat describe the sensation of those timbres when we hear something as ‘harsh’ or ‘round’.

Emotional response to timbre could also originate from how humans express and perceive emotions through timbral changes in the sounds of voices. When we’re angry we might shout, becoming louder and distorting our vocals, with more partials, whereas tender emotion is more easily expressed in a quieter, softer and rounder sounding voice. Even the pitch of our voice reflects emotion — sadness is generally lower compared to a higher-pitched, happier voice.

Various bodies of research show that different musical instruments evoke certain emotions. The reason for this is contested. The emotive connotations of instruments might be something we’ve been ‘taught’ culturally by theatre or opera, and subsequently in films and games. However, there seem to be broader commonalities between emotions and the specific categories of timbre, which may apply cross-culturally. For example, slower attack and lower-register instruments with more striations, such as the cello and bassoon, are perceived as sadder compared to short-attack instruments, like a brighter-sounding xylophone.

Picking timbre by instrumentation

Now that we’ve looked at how different timbres can nudge sounds in specific emotional directions, let’s look at how a musician can do this. Sergei Prokofiev, one of the world’s greatest musicians, created ‘Peter And The Wolf’ (1936), a stunning example of characters represented by specific instruments to reflect their real-world sound. The specific timbre also hints at each character’s unique personality, vocal characteristics, and role in the story. Birds are represented by the flute, a duck by the oboe, the wolf by the horns, and Peter, the protagonist, by strings.

There’s a more thorough technical explanation for this. The slightly more nasal quality of the oboe resembles a duck quacking, while the thin, bright flute sound is similar to birds twittering and chirping. Then you have the sneaky timbre of the clarinet, representing the cat. Finally, the howling, round, deep sound of three french horns and the slight dissonance between them creates a sensation of impending, ominous evil, representing the wolf. It’s quite easy to hear the difference between a silver flute and a real bird chirping in the woods, but it’s the feelings and associations we form when hearing them that matter. Of course, harmonic progressions and melodies also influence and evoke emotions, but an instrument’s pure sound and the way it’s played clearly demonstrate how we connect timbre to emotion.

A classic masterclass on character-based instrumentation.

Another well-known example that illustrates the intrinsic nature of music, emotion and character, within the context of less ‘traditional’ harmony, is the unsettling yet action-packed ‘Joker’s theme’ by Hans Zimmer from the ‘The Dark Knight’. The piece places far more focus on the eeriness and dissonance of the instruments than Prokofiev’s Peter and the Wolf. The main sound consists of one ‘object’: a layer of distorted guitars, cellos and recorded material, such as piano strings played using dull razor blades.

The strings, partly distorted by effect processing and performance, keep the listener in an uneasy state of tense expectancy. Most people usually recognise how a string instrument is played, and can identify how the bow forces the tension on the strings. This is coupled with tremolo, a performative way of adding tension. These techniques present even more partials and harsh timbre, as we hear the instrument continuously pushed while the sound seemingly stretches. The harshness of metal against metal from the aforementioned razor blades, together with the distortion processing, enhance this raking feeling until we’re sonically and mentally transported into the throes of a system that is working beyond its limits.

This is the perfect instance of synchronisation of performance and effect processing to achieve appropriate emotion-evoking audio — the desired noisy, inharmonic timbre. Each sound is carefully layered together to create one artificial instrument, where beating frequencies, noises, and combinations of timbre create a complex unpleasantness, iconic of the tension and ‘fear’ central to the characters in the story.

‘Why so sonically serious?’

Designing emotional sounds

It’s a sound designers’ job to have far more options at their disposal than what instrument to play. In every sound struggle we can call on our trusty equaliser to sculpt a sound by boosting or reducing certain parts of the timbre. We can also add more harmonics through different types of distortion, chorus, phasers or moving filter effects, and create space and movement with reverb and delays. There’s also a whole range of different synthesis techniques to generate creatively and emotionally fascinating timbres that are beyond the realm of physical instruments.

To craft a sound’s overall emotional direction or aesthetic, we should consider its timbral qualities and how they apply to instruments and synthesizers. If we want a real downer of an instrument sound, we could lean on the hollow, noisier shrill of a cello (nothing against cellos). If we regard the instrument as the whole sound or object we can consider parameters like the room it’s playing in, and what effects to use. Adding a longer reverb to a cello could emphasise the overawing feeling of smallness in a large space, like a spaceship in a remote galaxy, or the loneliness of the sound.

If we have two cellos playing simultaneously that are slightly detuned and add extra distorted harmonics, the harmonics from beatings and distortion might contribute to a more eerie and uncertain feeling, like with Zimmer’s ‘Joker’s Theme’. At the same time, loudness and harmonics could make the difference between uneasiness and aggression, like how a person’s voice changes. In the end, it’s not about perfectly recreating the sound of a cello. However, since we know it’s apt for sad melodies, we can draw inspiration from its timbral qualities.

Sound design isn’t simply designing a single instrument or collection of sounds. It involves examining how different roles play together. Minor changes in each instrument can, when stacked up, make a sizable difference to the overall sound. The interplay between timbres and the way each one affects emotions is vital. Some might see this as mixing in a more traditional sense, but in actual fact it’s sculpting sounds or timbre on a macro level, rather than a micro level.

To create deeper, emotion-specific sound, we should consider instruments as multifaceted. This means acknowledging multiple sound sources across different layers, adding multiple effects, and understanding the way these play together in creating a whole, rather than a single instrument. That’s how we can broaden the timbral palette, and learn to design instruments and sounds with specific emotional qualities. Since there’s no blueprint for how timbre affects emotions, we may have to rely on trial and error, personal aesthetic judgement or previous conventions to find what to add to or subtract from the sound. With this mind, we’ll be able to further thinking about how to evoke deeper, more emotionally immersive sonic experiences, which can aid innovation in other musical features, such as harmony and performance.