Notes On Music (Part 1)

Charles Hinshaw
15 min readApr 21, 2019

--

This is the first in a series of posts inspired by notes that I made for myself when researching music theory. My original intention was to keep them as personal notes, but I’m sharing in case anyone else finds them useful… and also so that any experts who spot problems can speak up and help me correct my understanding. In this first post, we’ll start with the very basics — with the fundamental nature of sound waves — and use that to build a musical scale.

Vibrations

Sound consists of longitudinal pressure waves that come from objects vibrating. These waves push our eardrums back-and-forth — a process that results in impulses being sent through the cochlear nerve to the brain. This is, in super simplified terms, how we hear.

As a vibrating object moves in one direction, it pushes the molecules of air in front of it, compressing them into an area of high pressure. As the object moves back the other way, it creates a drop in air pressure that pulls the molecules back in (a rarefaction):

Vibrations compress air particles into areas of high pressure

These aren’t transverse waves — the displacement of the medium (the molecules of air) is in the same direction as the wave’s propagation. The individual molecules don’t travel the length of the wave; they oscillate back and forth about their individual equilibrium positions.

These particles are displaced as waves in the direction that the wave propagates

So why do we often draw them (or think of them) as sinusoidal?

If you were to sample the pressure fluctuations at a single point over time by drawing a continuous line that is at the top when the pressure is highest and at the bottom when the pressure is lowest, you would see that the pressure-time fluctuation has a sinusoidal nature:

Graphing pressure over time reveals our sine wave

That is actually something that early experiments with sound waves did ― scratching sine waves into smoked glass. And there are some real advantages to thinking of our graphs as representing pressure-time fluctuation: a lot of the properties that come to mind when we think of sound are quite intuitive when we view it from this perspective.

Of course, this isn’t the only thing that we might be attempting to illustrate by drawing sound as a sine wave. Depending on circumstances, we could also be depicting the transverse displacement of a fixed-fixed string (as we will do soon) or a number of other potentially ambiguous depictions. The important thing, I think, is to remember that sound isn’t a transverse wave, so when we draw a sound wave as a sine, we are not drawing the wave itself.

Loudness

The pressure of a sound wave relates to our perception of sound’s loudness. Compare the following two sounds, which would be perceived as identical except for the first being louder than the second:

These two waves sound the same — the first is just much louder

You can see that we are just changing the amplitude of the wave. That makes sense when you consider sound pressure (Pa) to be the force (N) of sound on a surface area (m²) : Pa = N/m².

For convenience, we express these sound pressure levels as a value on a logarithmic scale where 0 is the quietest sound that a human can hear. There is a unit of measure (the decibel (dB)) used to express the ratio between two physical properties on a logarithmic scale ― so it makes sense to talk about the pressure levels (amplitude) of sound in terms of dB.

So, given a pressure (p), and the reference sound pressure (pref = 2 10-⁵), we can calculate the decibel level as 10 log (p / pref)². As a consequence of this formula, sound pressure level increases by 6dB every time the pressure doubles (10 log (2/1)²). Likewise, as a sound wave propagates, the magnitude of the pressure decreases. There is a decrease of 6dB every time the distance doubles ― pressure decreases linearly. This also makes sense, both from the above formulas and intuitively.

Do you need to hand-calculate sound pressure levels or dB? No, probably not. But it is nice to understand not only what it means to say “the average speaking voice is 70 dB” in terms of relative loudness to other sounds, but in terms of what that means for the underlying sound wave’s amplitude.

Pitch

When we talk about waves, the other thing that we consider is frequency ― how often the particles of a medium vibrate when a wave passes through it. In the case of sound waves, this would mean how many times per second a given point goes from high pressure to low pressure and back to high pressure (or low to high to low).

We measure this frequency in Hertz (Hz), which is our unit of measures for “cycles per second”. We perceive this frequency as pitch. The higher the frequency, the higher we perceive the pitch. Compare the following two sounds. We would perceive the second sound as having a higher pitch than the first:

These two waves are equally loud, but the second is higher pitched

We base our (modern, western) music around a frequency of 440 Hz, which is known as the Stuttgart pitch and is standardized as ISO 16. Why 440 Hz? It hasn’t always been this way. Before it was standardized, many organizations used 435 Hz. At one point, there was a bit of an arms race with different groups cranking up their reference pitch in order to make their music more epic.

In scientific pitch notation, 440Hz is called A4. That name is based on where it occurs on a piano keyboard ― A4 is the A note after C4 (middle C: the C in the middle of a standard piano.) But that is getting ahead of ourselves.

Making Waves

Let’s talk about how to create waves of a particular frequency or pitch. We’ll start by adding waves together. When we add simple waves together, we create a complex tone. The waves that add together to create it are called the partials of that complex tone.

Remember that when two waves add together, we just add the values at every point along the wave. This is simple, but the results are profound and they lead to two kinds of interference: destructive and constructive.

In destructive interference, two waves can cancel each other out. If the amplitude for one wave is 1 at a point, and for the other is -1, the resulting wave will have an amplitude of 0 at that point. It is just 1–1 = 0:

Two waves add together to create destructive interference

In constructive interference, we have 1 + 1 = 2. The resulting wave has a greater amplitude than either wave that created it ― the second wave reinforces the first.

Two waves add together to create constructive interference

The results of this interference mean that if we create a bunch of random sound waves at the same time, we get noise — reinforcing in places and cancelling in others:

Interference turns a bunch of waves into noise

Moving Beyond Noise

To create a tone instead of noise, we need a standing wave ― a wave which oscillates in time but whose peak amplitude profile does not move in space. The idea is to have a wave that reflects back and perfectly reinforces itself.

This is easiest to imagine with a vibrating string that is bound at both ends — a fixed-fixed string:

Imagine a string bound at both ends

This string that we’re imagining can vibrate like this, for example:

This works because the ends are both bound

But it cannot vibrate like this with the ends bound:

This would be impossible (given we’re binding both ends)

This string can only vibrate in so many ways. This limited number of vibration options means that when the vibration reaches the end of the string, it will reflect back and reinforce itself. This is the idea behind a standing wave.

So how many ways can a string vibrate? Well, the obvious wavelength is the length of the entire string:

The whole string can have a single vibration

We call this whole vibration the fundamental or first harmonic. This is what determines the pitch of the sound that the string makes. The other possible wavelengths are all integer divisions of the fundamental. For example halves, thirds, fourths, etc.

We can only vibrate it in whole number divisions

These are all harmonic partials. We call these additional partials overtones. The second harmonic (the first overtone) has half the wavelength and twice the frequency of the fundamental. The third harmonic (the second overtone) has a third the wavelength and three times the frequency of the first, etc.

If, for example, our fundamental was 440 Hz, the 2nd harmonic would be 880Hz, the 3rd would be 1320 Hz, and on and on.

When our string vibrates, it creates a complex tone containing the fundamental and many overtones. The presence of these overtones adds complexity and richness to the sound. The relative strengths of each of the overtones is what gives each instrument a unique timbre.

As an interesting aside: the human brain processes overtones and can construct the perception of a fundamental that isn’t there. For example, kettle drums are constructed and tuned to produce near-harmonic overtones to an implied missing fundamental. Likewise, some pipe organs make use of this phenomenon to allow smaller bass pipes to produce very low-pitched sounds.

From Harmonics to Octaves

Our ears tend to “like” simple ratios (where the overtone pattern and the fundamental are very similar) and the 2:1 ratio between the second harmonic and the fundamental is as simple a ratio as we can create.

This simplicity makes the second harmonic quite special in music. In fact, if you double or halve any frequency, we consider that new tone to sound or feel very similar. 440 Hz sounds like 880 Hz. We can hear the difference and would say that 880 Hz is a higher pitch, but they sound alike in an interesting way — they aren’t distinct notes.

That range between the fundamental and the second harmonic ― between 440 Hz and 880 Hz, for example ― is called an octave. The idea that two notes an octave apart sound “the same” is called octave equivalency. There is actually a biological basis to it: apparently there is an octave mapping of neurons in the auditory thalamus of the mammalian brain.

This octave equivalency logically means that all distinct notes or pitches must appear within that doubling of frequency:

The area between the fundamental and second harmonic — an octave

Octave equivalency also underpins a really cool auditory illusion that you’ve heard everywhere from Batman’s motorcycle to the soundtrack for the movie Dunkirk — the Shephard tone. The Shephard tone sounds like a tone that continually ascends or descends in pitch, yet never actually seems to get higher or lower. It creates the feel of a never ending increase or decrease in tension.

Dividing Our Octave

We’re doubling values, so if we want to divide our octave into a number of evenly spaced tones (which is dividing in equal temperament), we need to take into account that our scale is logarithmic and find the ratio between pitches.

In western music, we divide into 12 (twelve-tone equal temperament), so the ratio between successive pitches is 12√2 ― the twelfth root of two. That is 1.0593631 etc. It sounds ridiculous, but there are some nice mathematical properties hidden in this ratio. Starting at 440 Hz, we can multiply subsequent values by 12√2 until we get 880, eg:

0    Fundamental      = 440 Hz
1 440 × 12√2 = 466.16 Hz
2 466.16 × 12√2 = 493.88 Hz
3 493.88 × 12√2 = 523.25 Hz
4 523.25 × 12√2 = 554.36 Hz
5 554.36 × 12√2 = 587.33 Hz
6 587.33 × 12√2 = 622.25 Hz
7 622.25 × 12√2 = 659.26 Hz
8 659.26 × 12√2 = 698.46 Hz
9 698.46 × 12√2 = 739.99 Hz
10 739.99 × 12√2 = 783.99 Hz
11 783.99 × 12√2 = 830.61 Hz
12 Second Harmonic = 880 Hz

The ratio between any two of these pitches is called an interval, and an interval of one twelfth root of two (between two pitches next to each other) is called a semitone. Two semitones is a tone. These are also sometimes called half-steps and whole-steps.

So, we would say that the interval between 2 and 3 (above) is 1 — a semitone or half-step. The interval between 3 and 5 is 2 — a tone or whole-step.

Working with pitches of “698.46 Hz” is a hassle, so we label these to refer to them nicely ― these labels get us back to our note names like “A4” and “C3”.

It is worth mentioning here that anything vibrating at 440 Hz is going to be that A4 note. We’ve talked about vibrating strings, but a saw whose teeth hit wood 440 times per second will sound like that note. It will probably have mostly inharmonic overtones, but the the fundamental will be recognizable.

Simple Ratios

As I mentioned, 12√2 has some nice properties mathematically. When you take each of these notes and look at how they relate to the first, there are seven that are almost simple ratios:

0    440/440      = 1.00     = 1:1
2 493.88/440 = 1.122 ≈ 9:8 (1.125)
4 554.36/440 = 1.260 ≈ 5:4 (1.250)
5 587.33/440 = 1.336 ≈ 4:3 (1.333)
7 659.26/440 = 1.498 ≈ 3:2 (1.500)
9 739.99/440 = 1.682 ≈ 5:3 (1.667)
11 830.61/440 = 1.188 ≈ 17:9 (1.889)

Why do these simple ratios matter?

This has been theorized by philosophers and mathematicians for thousands of years. D’Alembert theorized that every fundamental tone heard in nature is accompanied by it’s second harmonic (the octave), and by the third harmonic (the twelfth). The interval between the octave and the twelfth is a fifth, so he argued that was the most consonant to the scheme of nature.

Helmholtz developed a theory of consonance and dissonance in terms of beats (like when tuning forks of slightly different tune are struck). He argued that that notes sound good or bad based on beats produced between their harmonics — that beats sound good because they have harmonics in common.

Creating a Scale

With these seven notes, we’ve created a scale. Well, any division of the octave space into a number of tones is a scale, but we’ve created a very specific scale: a Major scale. We call this a heptatonic scale because of those seven notes.

A scale starts on a note, which we call the tonic. Based on the frequencies, we created an “A Major” scale, so we would start with the tonic A. If this was the scale “C Major”, we would start with the tonic C:

0    C     1:1  (unison)
2 D 9:8 (major second)
4 E 5:4 (major third)
5 F 4:3 (perfect fourth)
7 G 3:2 (perfect fifth)
9 A 5:3 (major sixth)
11 B 17:9 (major seventh)

I’ve also added names for the intervals between each note and the tonic. In C major, we could say that F is “a perfect fourth” from C because that is the interval between the tonic C and that note. Likewise, we would refer to G as a perfect fifth from C.

What about the other five notes? Those notes aren’t in our scale, but they still have names:

1    C♯ (or D♭)
3 D♯ (or E♭)
6 F♯ (or G♭)
8 G♯ (or A♭)
10 A♯ (or B♭)

Those symbols are just descriptive: we use sharp (♯) to mean “one semitone above” and flat (♭) to mean “one semitone below.” Which do we use? Is it C♯ or D♭? Well, in modern twelve-tone equal temperament (like the scale that we’ve created) C♯ and D♭ are enharmonic and equivalent, so their frequencies are the same. They do, however, have different roles in harmony and chord progressions and the choice of one or another can dramatically improve the readability of a line of music.

Returning to our scale (C major), what would it look like as musical notation? Well, we can put our notes on a staff (the lattice that the notes are drawn on), but we need to agree which lines mean which notes. In this case, the Treble Clef (the fancy-looking swirly symbol) tells us that middle C (C4) starts one line below the bottom of our staff:

C major: C D E F G A B

We can number each note in the scale sequentially. This number is the scale degree.

Nothing tricky here… just counting.

Scale degree is just the counted number of the note in the scale (written in the circle on the diagram). We already mentioned tonic. You might also see them named, as follows:

1    Tonic
2 Supertonic
3 Mediant
4 Subdominant
5 Dominant
6 Submediant
7 Leading Tone

So in C major, G is the dominant. It is a perfect fifth from the tonic C.

The relationship between tonic and dominant is important — we will find that perfect fifth interval all over the place. Why? Well, it has quite a few things going for it ― it is the interval between the second and third harmonics, which are the two overtones with the lowest simple ratios. It is also 1.5 times the frequency of the tonic ― exactly half way between the tonic and the octave.

Keep in mind that things loop at the octave. This means, for example, that the dominant and subdominant are the same distance from the tonic by scale degree, but only if you go in different directions.

More Scales

Looking at the piano for C Major you can see that all of the notes in the scale are the white keys — the black keys correspond to the remaining 5 notes that aren’t part of the scale. Counting those black keys, you would see that the intervals between white keys are: tone, tone, semitone, tone, tone, tone, semitone:

Read just the “white” keys to see the pattern.

These intervals hold true across all the Major scales (not just C Major). If our scale was D Major, for example, we would start with the Tonic D and follow the same pattern (tone, tone, semitone, tone, tone, tone, semitone):

1    D   (Tonic)
2 E (1 + tone)
3 F♯ (2 + tone)
4 G (3 + semitone)
5 A (4 + tone)
6 B (5 + tone)
7 C♯ (6 + tone)
8 D (Octave: 7 + semitone)

This process can be used to derive any Major Scale, but there are other kinds of scales.

For each major scale, there is a minor scale that begins on the 6th degree (the submediant) of the major scale and proceeds for an octave. For example, the A Minor scale can be constructed from the C Major scale:

  • A is the submediant (6th degree) on our major scale
  • Because of this, “A minor” is called the relative minor of C major.
  • We can also say that C major is the relative major of A minor.

The natural minor scale’s intervals are: tone, semitone, tone, tone, semitone, tone, tone. (There is nothing tricky about this ― we’re continuing intervals from where we diverge from the relative major).

This means that the A minor scale is A, B, C, D, E , F, G, A. If we compared A minor to A major, we would see that the 3rd, 6th, and 7th note positions are lowered by a semitone.

When we say “minor scale”, we probably mean the natural minor scale that we just described, but there are actually other minor scales that you might find if you are searching for a reference a scale chart:

  • There is the harmonic minor scale, which is a minor scale where the 7th degree is raised by one semitone. This creates what is called an “augmented second” between the 6th and 7th scale degrees. The harmonic minor scale is called this because it is a common foundation for harmonies (chords) in minor keys.
  • There is also a melodic minor scale, because that augmented second can be awkward when not dealing with harmonies (especially in vocal performances). It is based on the harmonic minor scale, but is different depending on if you are going up the scale (ascending) or going down (descending). When ascending, the Melodic Minor scale raises the 6th degree by a semitone so that it works better with the 7th. When descending, the Melodic Minor scale lowers the 7th degree by a semitone (making it identical to a natural Minor scale.)

There are many other scales, and you should look them up. Pentatonic scales are a good place to start since they can be derived from our major/minor scales (eg. by omitting the 4th and 7th from the major or the 2nd and 6th from the minor ). The minor pentatonic scale can be considered a gapped blues scale. Keep digging, and eventually you’ll be reading about just intonation diatonic scales and double harmonic major scales.

So, we’ve learned to build a musical scale out of tones created by sound waves. But what do we do with the tones in that scale? In Part 2, we’ll look at how we can combine them to create chords.

--

--

Charles Hinshaw

Designer / developer in Copenhagen. Interested in the nature of user/tool co-adaptation and how intelligent tools can empower creative people.