How I built a music synthesiser using SQL
This may be a world first. As it’s usually the case with things that share this distinction, there’s a good reason why: it’s a ridiculous idea. Even if you were trying to develop your own digital synthesiser from scratch, you should look at any language but SQL.
That being said, exploring ridiculous ideas is a great way to look at everyday things under a different light. It gives us the chance to push the limits. And I think SQL needs that, as it’s often regarded as a second-class language that can’t do much beyond basic data analysis.
I’ll be using PostgreSQL, because its series generation and byte array capabilities will turn out to be useful. As hinted by the title, the goal is to use only declarative SQL. That is, no cheating in the form of language extensions allowing imperative programming (e.g. conditional blocks or loops).
The process can be broken down as follows:
- Calculating frequencies
- Feeding the melody to the synthesiser
- Synthesising audio
- Exporting a WAV file
Calculating frequencies
Sounds are audible vibrations. The speed of such vibrations determine the sound frequency, measured in hertz. When sounds are arranged to give way to melody and rhythm, they become music. To have melody, we need musical notes.
Musical notes prescribe mathematical relationships between different frequencies. Although they’re not tied to specific frequencies, the starting point is usually set to 440hz, corresponding to A₄ (A above middle C). Using the formula below, it’s possible to get the frequency for any note n half-tones above or below 440hz.
Note how doubling a frequency gives us the same note, an octave (twelve half-tones) above. For example, 880hz gives us A₅, an octave higher than A₄ at 440hz.
Based on the math above, the following code creates the view v_musical_notes
containing the note name (e.g. A), octave (e.g. 4) and frequency in hertz (e.g. 440).
Feeding the melody to the synthesiser
The SQL synthesiser will need to be fed a melody. The query below creates the table music_sheet
with the values for Twinkle Twinkle Little Star.
The melody is now stored in the database, but in a human readable notation that cannot be easily synthesised. To allow the melody data to be easily translatable into sound, I’ll create the view v_technical_sheet
. This view joins v_musical_notes
to get the frequency for each note. Also, it calculates the number of samples for which the note will be played.
Samples are data points used to store sound, which is a continuous signal, using discrete values. In a way, they’re similar to pixels in an image: tiny points that when coming together can reproduce a signal with varying levels of fidelity.
Similarly to pixels, which can originate from a photograph or from a computer graphics software, samples can be obtained by recording an existing sound wave (sampling it) or synthesising a completely new one. Here I’m going for the latter. But how many samples per note will be used? This depends on a few things:
- The relative duration of the note. For example, notes that have a value of 0.5 will last twice as long as and use twice as many samples as those with a value of 0.25.
- The sample rate. This determines how many samples are needed per fixed unit of time. I’ll be using the standard 44100hz. In other words, if I want to play a note for a second, I’ll need to allocate 44100 samples to it.
- The musical signature and tempo. I’ll go for 4/4 and a tempo of 60bpm, which makes a quarter note (0.25) last one second.
Below, the code for v_technical_sheet
.
It’s getting close. All the building blocks are in place. We know which frequencies to play and for how many samples. But how is sound actually synthesised?
Synthesising audio
Sound files, such as WAVs, don’t store frequencies. They store samples. Let’s say I want to play the 440hz frequency for 44100 samples (one second). How can this be done?
To achieve this, I’ll need a wave generator. I’ll use a popular and straightforward option, the sine function. By numbering each sample (i.e. 1, 2, 3, 4, …), I can generate a sine wave using the Postgres expression sin(note_sample_id)
. Since the sine function repeats itself every 2π, this will generate a wave with a frequency of 7019hz. This is because the sine function will repeat itself 44100/2π or ~7019 times in the span of a second.
Based on the above, the following expression can be used to generate a sine wave with a frequency of 440hz: sin(440*note_sample_id*2*pi()/44100)
. The resulting wave will have the desired frequency, but will oscillate between -1 and 1. This determines the amplitude of the wave, or the “volume” in more practical terms. Since I chose to go for 8-bit audio, I’ll need 8-bit unsigned values — integers between 0 and 255 — rather than values between -1 and 1. With this in mind, I can adapt the expression to 127*sin(440*note_sample_id*2*pi()/44100)+128
.
There’s one small issue with that formula: it doesn’t contemplate an envelope at all. Notes will stick to each other. For example, two Cs lasting 1 second will be indistinguishable from a C lasting 2 seconds. To fix this, I’ll replace the 127
constant at the start of the expression for least(127, note_sample_id/10000.0*127)
. This will gradually increase the amplitude of the wave during the first 10000 samples of the note.
Below, the code for v_synthesised
. At this point, I’ve synthesised the melody. Each row is an 8-bit sample, of a single channel (mono audio) with a sample rate of 44100hz.
Although the audio has been synthesised, there’s no way to play a Postgres view directly from a media player.
Exporting a WAV file
To play the audio I synthesised, I’ll generate a WAV audio file which will work on most media players. WAV files are relatively easy to generate. On one hand, they contain uncompressed audio, so I’ll be able to simply dump the contents of the view into the file. On the other hand, they’re binary files, requiring extra care and some bitwise operations.
WAV files consist of a header containing metadata — with information such as sample rate, number of channels and bit per channel — and a data section. The code below generates the binary header and concatenates the contents of v_synthesised
. The result is a byte array that fully conforms to the WAV specification.
With a single line in the terminal, the WAV is exported to /tmp
.
psql -U postgres -qAt -c "select encode(file,'base64') from v_wav limit 1" | base64 -d > /tmp/twinkletwinkle.wav
The code for the SQL Music Synthesiser is available on GitHub.
Interested in hearing what it sounds like, but don’t want to spin up an environment and clone a repository? Check out the video below.