Contemporary Popular Music that doesn’t use equal temperament

Emmanuel Deruty
11 min readJul 27, 2022

--

Today’s music is generally considered to be based on the 12-semitone equal temperament. For instance, according to [Lindley, 2001], “[e]qual temperament, in which the octave is divided into 12 uniform semitones, is the standard Western temperament today except among specialists in early music”. Does contemporary Popular Music (in the sense of [Deruty et. al, 2022]) always use equal temperament?

British singer and guitarist Nick Drake customarily uses custom temperaments.

In the field of Music Information Retrieval, the goal of automatic music transcription is to convert music signals, including contemporary Popular Music, into music notation. As illustrated in Figure 1 below, a common output for automatic music transcription is the piano roll. In such an output, the pitch is expressed as “MIDI pitch” and quantized to midi notes. An underlying assumption is that a suitable approximation of the transcribed music uses equal temperament. Accordingly, a neighbouring assumption is that the music to be transcribed was originally based on equal temperament. In this document, we provide three examples of contemporary Popular Music that challenge this assumption.

Figure 1. Automatic music transcription: piano roll output, from [Benetos et al., 2018]

Example 1: guitar solo from Pink Floyd, “Shine On You Crazy Diamond”, Part 1

Shine On You Crazy Diamond is a 26-minute composition by British band Pink Floyd. It appears on their 1975 album Wish You Were Here. The composition is divided into 9 parts. The present example derives from an extract from part 1. Part 1 lasts 3'54 minutes. It starts with layered synths playing a G minor chord (0'00–0'25). From 0'25 to 2'10, a solo keyboard plays on top of the continuing G minor chord. From 2'10 to 3'00, a solo guitar plays over changing chords (Gm, Dm, Gm, Dm, Cm, Dm, Gm). A coda returns to the initial layered synth playing a G minor chord. For this example, we focus on the guitar between 2'10 and 2'33. We show that the guitar doesn’t follow a 12 semitone equal temperament.

Video 1 below shows the chroma similarity matrix for Shine On’s part 1. The alternation of dark and light squares along with the diagonal show the chord changes. The blue squares illustrate the sections described above. The yellow square corresponds to the studied part.

Video 1 — Shine On You Crazy Diamonds, part 1.

Let’s track the guitar pitch between 2'10 and 2'33. Figures 2 through 5 are based on the STFT for the audio between 2'10 and 2'33, divided into four consecutive extracts for better readability. Video 2 compiles the four Figures along with the corresponding audio. In all four images, the thick white horizontal line shows the frequency corresponding to G6 for the layered synth. At 1578Hz, it corresponds to a 442.8Hz tuning. The dark horizontal lines correspond to the perceived played guitar notes for the 442.8Hz tuning, along with their harmonics. The coloured dots show the detected frequencies for each harmonic of the guitar’s sound. Both the fundamental frequencies and the harmonics consistently deviate from the 442.8Hz tuning.

Figure 2 — Shine On, extract 1.
Figure 3 — Shine On, extract 2.
Figure 4 — Shine On, extract 3.
Figure 5 — Shine On, extract 4.
Video 2 — Shine On, extracts 1–4.

Figure 6 shows the frequency distribution for the guitar’s fundamental frequency as analyzed above (frequency distribution corresponding to the red circles when a note is played). The vertical lines show the note frequencies in equal temperament. The observed interval between C and D is significantly and consistently smaller than 2 semi-tones, with a higher C and lower D. Also, F and Bb are slightly higher. In the context of G minor, the scale can be written the following way: G, A, Bb+, C++, D-, F+, G. The temperament used in the extract does not correspond to any traditional Western temperament, as in all of them, the fifth (here, D) is built from a 3/2 frequency ratio.

Figure 6 — Pitch distribution.

Example 2: guitar tuning in Nick Drake, “River Man”

An acoustic beat is an interference pattern between two sounds of slightly different frequencies, perceived as a periodic variation in volume whose rate is the difference between the two frequencies. Click on the image on the left to open Video 3, which shows and plays 1) a 440Hz sine wave; 2) a 444Hz sine wave and 3) the sum of the two sine waves, generating acoustic beats.

Video 3, acoustic beats.

The difference in frequency generates the beating. The volume varies like in a tremolo. As the two tones gradually approach unison, the beating slows down and may become so slow as to be imperceptible. The phenomenon is illustrated in Figure 7. A sine wave of frequency 110Hz is summed with another sine wave whose frequency is shown on the y-axis. Each line shows the value of the spectrum of the RMS. The higher the luminance, the stronger the volume envelope at the frequency shown on the x-axis. Figure 7 illustrates how 1) the acoustic beat is harmonic, with fundamental frequencies and harmonics; 2) the larger the difference between the two sine wave frequencies, the faster the beating, and 3) as the two sine waves’ frequencies get close to each other, the beating slows down and its frequencies approach zero.

Figure 7 — RMS spectrum for the sum of sine waves of neighbouring frequencies.

Minimization of the acoustic beat is one motivation for tuning an instrument. The tuning may consist in making harmonics coincide to reduce the beating. “As different notes are played together, their harmonics can beat against each other. This typically sounds unpleasant in music, and the desire to avoid beating is one of the main factors that led to the various musical scales discussed in this article” [Durfee and Colton, 2015].

Nick Drake (1948–1974) was an English singer-songwriter. His songs largely rely on vocals and acoustic guitar. We analyze the audio from the guitar in River Man, a song from Drake’s 1969 album Five Leaves Left. We show how the guitar is tuned, so that neighbouring harmonics coincide, presumably to minimize beating. To do so, we study segments of the guitar part from the beginning of the song. In the videos on the left, the segments are shown in blue. In video 4, we show the segments in context. In video 5, we show the notes corresponding to the segment (transcribed by ear). Each segment corresponds to a note combination.

Video 4 — River Man.
Video 5 — River Man, guitar.

Throughout the nine segments, five notes are played: G2, C3, G3, E4, and G4. Figure 8 corresponds to segment 2, during which three notes are played: G2, C3, and E4. The image is divided into six horizontal bands. The position of the played notes in the image’s lowest band corresponds to their respective theoretical fundamental frequency. Each one of the five upper bands corresponds to one of the five played notes throughout the nine segments. In these five bands, the vertical lines correspond to the harmonics of the corresponding note, including the fundamental. The blue line is the spectrum. The x-axis is the frequency (linear scale), and the y-axis is the amplitude (logarithmic scale). Considering that the fundamental is referred to as ‘harmonic 1’, then one can observe that the guitar has been tuned so that G2’s harmonic 4 coincides with C3’s harmonic 3.

Figure 8 — River Man, segment 2.

In segment 5, four notes are played: G2, C3, G3, and E4. One can observe that three close harmonics coincide: G2’s harmonic 4, C3’s harmonic 3, and G3’s harmonic 2.

Figure 9 — River Man, segment 5.

In segment 8, four other notes are played: G2, C3, E4, and G4. One can observe that again, three close harmonics coincide: G2’s harmonic 4, C3’s harmonic 3, and G4’s harmonic 1.

Figure 10 — River Man, segment 8.

Tuning of the guitar to make the harmonics coincide makes the fundamentals step away from 12 semitone equal temperament. C3, G3, E4, and G4’s deviations from the theoretical interval to G2 are 0.40, 0.37, 0.34, and 0.39 semitones, respectively. The four top notes are significantly higher than they should be in relation to the bottom note. The observation may be consistent with the fact that, as shown in Figure 11, G2 is inharmonic: the frequencies for harmonics above the fundamental are higher than the corresponding multiples of the fundamental frequency. Strings above the one playing the G2 have to be tuned higher to coincide with the harmonics of G2.

Figure 11 — River Man, segment 1.

Tuning as described above is fine-tuning. Nick Drake also uses coarser tuning, in other words, alternate guitar tunings. Different songs involve different tunings. In live shows, Nick Drake had to retune his guitar between songs. The standard tuning for a guitar is E-A-D-G-B-E; Nick Drake might use, for example, C-G-C-F-C-E. Note that such a tuning favours purer intervals, which in turn. “[…] are defined as those formed by small[er] integer ratios; such intervals have a maximum overlap of harmonic partials of these instruments and hence tend to minimize beating” [Sankey and Sethares, 1997]. Nick Drake’s use of purer intervals in the string tuning provides an additional indication that his tuning practices (both coarse and fine) aim at minimizing acoustic beating.

Example 3: detuned samples in Angel Haze, “Weight”

Rap music often involves the manipulation of samples from existing music. Figure 12 shows all sample sources from Kanye West’s 2016 track “Famous”.

Figure 12 — sample sources in Kanye West’s “Famous” (from [Van Balen, 2016]).

Sample manipulation may include pitch-shifting. Figure 13 shows Ableton Live’s sample settings. “Detune” is an easily accessible sample setting next to “volume”. “Detune” is expressed in cents (1/100 of half-tone), not in musical pitch. This makes it very easy to rapidly tune a sample to any pitch without reference to a scale.

Figure 13 — Live’s sample settings.

Angel Haze is an American rapper and singer. Listening to her 2021 track Weight, one can hear many notes in the instrumental that don’t sound as in tune. We study pitch in an extract from Weight spanning from 0'32 to 0'36. Audio 1 provides the context for the extract.

Audio 1. Weight, 0'25 to 0'40.

The vocals are removed by summing the left channel and the phase-inverted right channel. Audio 2 is the studied content, looped four times.

Audio 2. Weight, vocals removed, 0'32 to 0'36, looped.

The studied extract is divided into 16 segments of ca. 0.25s each (one eighth-note), which is the smallest observed pitch duration. There is no pitch change inside a segment. For each segment, we perform a Fourier transform. The Fourier transform is compared with the audible pitch by reporting the harmonics for each one of the heard notes on the spectrogram. The note’s frequency is chosen as the frequency for the first harmonic (f0) as derived from the spectrogram.

Figure 14 illustrates this process. The harmonics, including the f0, drawn as solid vertical black lines, correspond to the heard notes in segments 1 through 8. As far as segments 9 through 16 are concerned, three difficulties appear. (1) For the lowest heard note, harmonics above f5 included seem to be missing. In Figure 14, these harmonics are represented as vertical dashed lines. (2) For the highest heard note, f0 seems to be missing. In this case, we derive an f0 value from the frequency that generates harmonics that are closest to the corresponding harmonics. In Figure 14, f0 is represented as a vertical dashed line. (3) There are three harmonics (f0, f1, and f2) corresponding to a note that’s almost not audible. In Figure 14, those three harmonics are represented as vertical grey dashed lines.

Figure 14 — manual pitch analysis.

Figure 15 shows the observed pitch for the studied extract. The f0s for the notes that can be heard are represented as solid black lines. The f0s for the notes that can’t be heard are represented as dashed grey lines. Two observations. (1) Several frequency values derive from the standard 440Hz / equal temperament values. (2) For a single note (for instance G), different deviations from the standard pitch can be observed.

Figure 15 — observed pitch.

Figure 16, top, confirms observation (2). The octaves are discarded, and the same notes at different octaves are represented on the same line. There appears to be no single frequency value for a single note. Figure 16, bottom, represents pitch deviation from the equal temperament. Although generally, the pitch is higher than the standard 440Hz tuning, a variety of deviations can be observed. The extract doesn’t follow a particular temperament.

Figure 16 — folded pitch and deviation from equal temperament.

Conclusion

The three examples illustrate how contemporary Popular Music can deviate from equal temperament. In the first example, Shine On You Crazy Diamonds (1975), pitch follows a temperament in which the distance between the fourth and the fifth degree is particularly low. In the second example, River Man (1969), minimization of acoustic beating leads to the guitar being tuned so that the higher notes are higher than they should have been would the equal temperament have been used. In the third example, Weight (2021), no fixed scale seems to be used, different frequency values can correspond to the same heard note.

Naturally, three examples are not enough to generalize. Still, it suggests that the assumption according to which “[e]qual temperament, in which the octave is divided into 12 uniform semitones, is the standard Western temperament today except among specialists in early music” [Lindley, 2001], should be questioned. Early music is not the only Western trend for which deviation from equal temperament is significant.

References

[Benetos et al., 2018] Benetos, Emmanouil, et al. “Automatic music transcription: An overview.” IEEE Signal Processing Magazine 36.1 (2018): 20–30.

[Deruty et al., 2022] Deruty, Emmanuel, et al. “On the Development and Practice of AI Technology for Contemporary Popular Music Production.” Transactions of the International Society for Music Information Retrieval 5.1 (2022).

[Durfee and Colton, 2015] Durfee, Dallin S., and John S. Colton. “The physics of musical scales: Theory and experiment.” American Journal of Physics 83.10 (2015): 835–842.

[Lindley, 2001] Lindley, Mark. “Temperaments.” Grove Music Online. Oxford University Press. Date of access 11 Jun. 2022. https://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630.001.0001/omo-9781561592630-e-0000027643

[Sankey and Sethares, 1997] Sankey, John, and William A. Sethares. “A consonance-based approach to the harpsichord tuning of Domenico Scarlatti.” The Journal of the Acoustical Society of America 101.4 (1997): 2332–2337.

[Van Balen, 2016] Van Balen, Jan, Ethan Hein, and Dan Brown. “Why hip-hop is interesting.” Tutorials of the International Society for Music Information Retrieval Conference (ISMIR), New York City, USA. 2016.

Our team’s page: Sony CSL Music — expanding creativity with A.I.

--

--

Emmanuel Deruty

Researcher for the music team at Sony CSL Paris. We are a team working on the future of AI-assisted music production, located in Paris and Tokyo.