Coding a basic reverb algorithm - Part 1: An introduction to audio programming

Published in

The Seeker’s Project

7 min readSep 2, 2018

Reverb is a very powerful effect in music that accentuates the audio and is widely used in music production. I am sure you have given yourself some great performances in the bathroom and noticed how particularly awesome you sounded in the bathroom as opposed to any other room of your house. Well, the credit for that goes to reverb. The idea of a reverb is simple.

Reverb is characterized as random, blended repetitions of a sound occurring within thirty milliseconds after the sound is made.

I started working on this problem when I had no prior experience with audio programming and a very basic understanding of how digital audio works. As I made my way through the problem I scoured the internet to understand how to achieve this. My motive here is to make an account of my learning with references to resources I used at every step so that it serves as a comprehensive reference post to anyone who is beginning with audio programming.

Problem statement:

Given an audio file, apply reverb to the audio.

Assumptions:

The audio file would be a WAV file (8-bit and 16-bit PCM encoding)
The parameters of reverb that can be controlled would be: Delay, Decay factor, Dry/ Wet mix percentage

Programming language used: Java

The complete code can be accessed from my Github: https://github.com/Rishikeshdaoo/Reverberator

Why WAV file only?

Basically to simplify the problem.

The reason behind limiting the reverb to only WAV files has to do with how audio files are encoded. The various audio formats that we know of such as: mp3, wav, ogg, flac are basically different ways in which audio is stored digitally.

Digitally, audio is stored in the form of ‘samples’.

Sound is a continuous-time function and digital devices work only on discrete events. Hence we need some way to represent sounds as a discrete-time function.

Samples can be visualized as values extracted at periodic intervals from a continuously variable analog waveform. Sampling converts the analog waveform into a discrete series of samples (numbers) representing the value of the waveform at precise times. Sampling converts the waveform from the analog domain into the digital domain, which is better suited for processing by computers.

WAV (and specifically 8-bit and 16-bit PCM) files follow an encoding called Pulse Code Modulation. You can read about it in detail here and here.

In gist, what it means from a programming point of view is that, each sample in the file will take fixed size (called ‘frames’). Eg: In an 8-bit PCM encoded file, every 8 bits will make up one sample.

This makes our job easy, as we can directly read a WAV file into a Byte array; all the sample frames lined up in order inside the array.

For other formats, such as mp3, the encoding is far more complicated and would involve a lot of coding just to retrieve the samples from the audio file.

Step by step breakdown

The complete program from reading the WAV file to playing back the reverb processed file can be broken down into the following steps as milestones:

Step 3 and Step 4 are the most important steps in the program.

Step 1 & 2: Read WAV file and copy into Byte array:

Step 1 and Step 2 are quite straightforward. The WAV file is read into an Input stream. This data is copied into a Byte array. Go through the code here for the implementation in Java.

Step 3: Convert Byte array into an array of samples:

In this step, there are two main goals that need to be achieved to convert the audio data stored in byte array to an array of samples.

A great resource that I referred for this (here). I will use paragraphs from this resource in my explanation. In the code, this conversion is happening with a call to unpack method in the SampleDataRetrieval class.

Concatenating byte array:

The byte array contains the sample frames split up and all in a line. This is actually very straight-forward except for something called endianness, which is the ordering of the bytes in each packet.

Endianness:

Here is a diagram. This packet holds the decimal value 9999:

They hold the same binary values; however, the byte orders are reversed.

In big-endian, the more significant bytes has a bigger index. i.e. the MSB will be at highest index.
In little-endian, the more significant bytes has a smaller index. i.e. the MSB will be at index ‘0’

WAV files are stored in little-endian byte order and AIFF files are stored in big-endian byte order. Endianness can be obtained from AudioFormat.isBigEndian(A Java class that provides audio format specifications).

2. Decoding Signed and Unsigned values:

For PCM signed system:

Computers represents signed numbers by a system called two’s complement. This facilitates representing negative and positive numbers without any special logic for the two different types. Now, while we try to retrieve the actual numerical value from this, we need to extend the two’s complement.

To get the two’s complement negative notation of an integer, you write out the number in binary. You then invert the digits, and add one to the result. This basically sets the MSB as ‘1’ for negative value and ‘0’ for positive value.

This means that if the most significant bit (MSB) is set to 1, we fill all the bits above it with 1. The arithmetic right-shift (>>) will do the filling for us automatically if the sign bit is set.

For PCM unsigned system:

The unsigned system is simply an offset representation of the numbers

Step 4: Apply Reverb algorithm over the samples array:

After step 3, we have an array of float that contains the actual amplitude values for each sample. This will now serve as the input signal to our reverb.

Step 4 is the design and the programmatic implementation of the reverb. This involves a lot of detailed discussion and I have handle in a separate post here. Also, refer the source code for the reverb implementation.

At the end of step 4, we would have a float array of our ‘reverbed’ audio. Now we need to go back and write this into a WAV file so that we can play it back and listen to it.

Step 5: Convert samples array to Byte array and playback audio:

We have already handled the difficult part of the process. We are only left with packing the samples back to their original audio specifications and we are ready to listen to our reverb!

The process of packing the float array into a byte array, as you’d intuitively guess, is doing step 3 in reverse.

For PCM unsigned, convert it back into original representation. This is done by adding the offset that we had subtracted earlier (the fullscale). PCM signed would not need any alteration since the sign extended number would not cause any problems in the further computations.
Convert the values above into bytes, according to the little endian specification — Mask the number with 0xffL to extract 8 bits from the number and store in a byte array element. For 16 bit representation, bit-shift the MSB by 8 bits and then apply the mask.

3. The playback of audio uses the JavaSound API.

It uses the metaphor of a mixing console for the components of the API. ‘Mixer’ is the notion of a sound device. A ‘line’ is one of the strips in a mixing console for the audio i/o. I have used a ‘clip’ in this implementation, which stores the complete audio in the memory before playback. You can read the details about the JavaSound API here. Please refer the code comments for a step by step detailing of this implementation.

I hope this post helps out at least a few. Please let me know if there are any inaccuracies in the explanation or any improvements I could make. Also, please write to me if you have any topic I could write on or projects you would like to work on together.

—

Rishi