Sound Synthesis in Python

Additive synthesis from basics using Numpy and Scipy.

8 min readJan 12, 2023

Waveform — Photo by Pawel Czerwinski on Unsplash

This is the first in a several-part series on digital sound. We will start with sound synthesis. Sound synthesis is a complicated subject that could take a long time to cover. Starting off with the basics of digital synthesis first and then going on to other types of sound processing. If you need any further help, I used this tutorial to help me, and it might be useful for you as well.

First off, synthesis is the act of creating a sound using various methods. In this article, we will focus on additive synthesis, which is synthesis by way of adding up different signals. We will use raw numerical values first to generate a sound and then cover more complex topics.

The libraries we will be using are Numpy and Scipy. Numpy is a library for numerical computation, and Scipy is a library we will use for signal processing and turning our data into sound files. We will also install matplotlib to plot the sound signals. Pydub is a library used to playback audio. To install these packages, we will use pip and create a virtual environment. For this article, I am using Python 3.10 so adjust the instructions for your Python installation. In the terminal, let’s first make a directory, then create a virtualenv and install the requisite packages.

mkdir python-sound
cd python-sound
python -m venv .env
source .env/bin/activate
python -m pip install numpy
python -m pip install scipy
python -m pip install matplotlib
python -m pip install pydub

Now that we’ve done that, please create a new file called sound.py and open it in a text editor. This will be the main file we will use to synthesize sound. Now let’s add the imports we need first.

import numpy as np
from scipy.io.wavfile import write
from scipy import signal
import matplotlib.pyplot as plt

What is Sound?

Let’s go over a bit of what sound even is. Sound is a pressure wave of air through space. It is a type of wave called a longitudinal wave, meaning it moves perpendicular to the direction of propagation. The Processing Foundation has a great introduction to sound, and there are other great introductions to the physics as well.

Sound can be represented as a signal. A time-varying series of numbers, signals are usually represented with square brackets like this x[n]. We can set a signal to always equal 1, for example, x[n]=1, or create a linear signal that varies like this x[n] = n. For our purposes, we will be dealing with periodic signals like Sine. These will take the form of x[n] = A sin(2πfn), where A is the amplitude and f is the frequency. The y-axis of the signal is the amplitude or volume, and the x-axis is the current time.

In NumPy these can be represented as arrays of floating point values between -1.0 and 1.0. Where 1.0 is the maximum amplitude we can create. We will use the linspace function to generate a series of evenly spaced time values and then use the sin function to create the signal. Each array value is called a sample, and we can generate multiple values, which are called samples.

Basic Waveforms

Let’s try generating a Sine wave using the following equation.

A is the amplitude, and f is the frequency of the signal. Frequency is the number of samples per second. It determines the pitch of the sound, lower frequencies have more bass, and higher frequencies have more treble. The code to generate a wave file for this sound would look like this.

AUDIO_RATE = 44100
freq = 440
length = 1

# create time values
t = np.linspace(0, length, length * AUDIO_RATE, dtype=np.float32)
# generate y values for signal
y = np.sin(2 * np.pi * freq * t)
# save to wave file
write("sine.wav", AUDIO_RATE, y)

We first have the audio rate constant, also called the sampling rate, which is the rate at which the sound card samples the audio from the microphone or other audio input source. In this case, it will be 44,100Hz. Hertz (Hz) is the standard measure of frequency which is the number of samples per second on most sound cards. The reason for this is the limits of human hearing and the Nyquist sampling theorem. Which states that to play back a sound of at most a frequency of f, we need at least 2f samples. Human hearing is usually as sensitive as 20,000Hz, which means the sampling rate needs to be above 40,000Hz.

That means that we need to generate a number of samples equal to 44,100 times our audio clip length in seconds. In this case, we want a 1-second audio clip, so we need to multiply our audio rate by 1. This will allow us to generate sounds that can span the range of human hearing.

We generate our time values using linspace, with the range going from 0 to 1 and the number of samples being 1 * AUDIO_RATE which is 44100 samples. The signal is then calculated using sin for each time value. The frequency is 440Hz, which is middle A. We can then write out the sound to a wave file using the write function in Scipy with the corresponding AUDIO_RATE.

If you listen to sine.wav, you will hear a short one-second sine wave sound. This sounds like a beep on the speaker, and if you can’t hear it check to make sure the code is correct and your audio settings are correct as well. Your audio format might need to be set to 44,100Hz in the settings.

Let’s plot the signal by using the plot function in matplotlib.


def plot(ts, ys, title, num_samples):
    plt.xlabel("t")
    plt.ylabel("y")
    plt.title(title)
    plt.plot(ys[:num_samples])
    plt.show()

plot(t, y, "Sine Signal", 512)

If we plot the first 512 samples, this is the plot we get.

The square wave is similar, but we use a square shape instead of a sine curve. This uses a step function to calculate rather than a Sine. We can achieve this using Sine and then a Heaviside step function.

y = np.heaviside(np.sin(2 * np.pi * freq * t), 1.0)

We can achieve the same thing using Scipy’s signal.square function.

t = np.linspace(0, length, length * AUDIO_RATE, dtype=np.float32)

# generate y values for signal
y = signal.square(2 * np.pi * freq * t)
# save to wave file
write("square.wav", AUDIO_RATE, y)

plot(t, y, "Square Signal", 512)

The square wave generates a buzzing sound that sounds harsher than the Sine. That’s because the square has more overtones than the Sine wave, which only has one tone. These overtones are also called harmonics and represent the higher-frequency components of the sound. For a square wave, that would be the odd harmonics of the fundamental. In our case, that would be 440Hz, 1320Hz, 2200Hz, …

We could replicate this by adding several sine waves at odd intervals. sin(440hz) + sin(1320hz) + sin(2200hz) + …

Another waveform is the sawtooth, created by summing the even harmonics. We generate this by adding both the even and odd harmonics of the fundamental.

t = np.linspace(0, length, length * AUDIO_RATE, dtype=np.float32)

# generate y values for signal
y = signal.sawtooth(2 * np.pi * freq * t)
# save to wave file
write("sawtooth.wav", AUDIO_RATE, y)

plot(t, y, "Sawtooth Signal", 512)

This generates a buzzing sound similar to a square, but since it has more overtones, it sounds harsher and more “buzzier.”

These three waveforms form the basis of most electronic sound synthesis. Many different types of sounds can be created using these three waveforms. An alternative waveform is a chirp which is a cosine wave that is interpolated across two different frequencies over time.

t = np.linspace(0, length, length * AUDIO_RATE, dtype=np.float32)

# generate y values for signal
y = signal.chirp(t, 440, 1, 880)
# save to wave file
write("chirp.wav", AUDIO_RATE, y)

plot(t, y, "Chirp Signal", 1024)

In this example, we sweep the frequency from 440 to 880 in one second. It sounds like it has a lilt, an uplift in frequency.

Signal Class

To make things easier for us, let’s create a Signal class and the corresponding subclasses to handle signal synthesis. Create a new file called signals.py with the following code. The class contains several helper methods and overloaded methods. The best part is we can use it to add, multiply, and subtract signals and plot the corresponding waveform.

The base class Signal takes in a ts parameter representing the time steps and a ys parameter representing the signal. The base classes inherit from this and generate their own ys and ts values. Let’s try using this new Signal class.

from signals import Signal, Sine, Square, Sawtooth, Chirp

sig = Sine(440, length=2)
sig.to_wav("sine.wav")

This creates a two-second long sine wave at 440Hz. Let’s try adding some Sine waves together at integer multiples of a fundamental frequency. Notice how it starts looking like a sawtooth wave.

fund = 220

sig = 1 / (2*np.pi) * Sine(fund, length=2) + \
  1 / (4*np.pi) * Sine(2*fund, length=2) + \
  1 / (6*np.pi) * Sine(3*fund, length=2) + \
    1 / (8*np.pi) * Sine(4*fund, length=2)
sig.to_wav("sine.wav")
sig.plot(512)

Let’s try playing the sound using the play method from pydub.

sig.play()

We can use the plot_fft method to plot the spectrum of the signal. The FFT is a method to turn a time-based signal into a signal of frequencies. This will undo the addition operation we just did and give us back the frequencies.

sig.plot_fft()

We can also plot the spectrogram of the signal using plot_spectrogram. The spectrogram is a 2D representation of the signal, with the x-axis representing time and the y-axis representing frequency. The brighter the color, the higher the amplitude.

sig.plot_spectrogram()

Let’s try plotting the FFT of a square wave.

sig = Square(440)
sig.to_wav("square.wav")
sig.fft()

The FFT of a square wave contains harmonics on odd multiples of the fundamental, which is 440Hz for the wave. Different waves can be constructed this way and produce interesting sounds purely through additive synthesis.

Exercises

Try adding different waveforms like Sine, Square, and Sawtooth.
Use the from_wav method to load a wave file and plot the fft.
Try reconstructing a square wave from a series of summed sine waves with odd harmonics.

Conclusion

Additive synthesis is a complex topic and I barely scratched the surface. You can use additive synthesis to construct many types of sounds that vary from the simple to the complex. Next time I will focus on subtractive synthesis using filters and go into more detail on different types of synthesis including FM synthesis.