Harmonizing Signals: Harnessing the Power of FFT to Decode Waveform Files

Working with Audio Binary Files in C

Nishant Aanjaney Jalan
CodeX
4 min readJul 14, 2023

--

It’s been a very long time since I have been programming, but never I had the chance to read from a binary file. I mostly worked with files that involved textual data. Last month, I decided to work on a particular binary file to produce something interesting.

As part of a group, we had a front-end and a back-end to this mini-project.

Source

Waveform Audio files

Waveform Audio files are binary files that store audio information. Unlike MP3 or other audio formats, this format does not compress any information, making it the easiest to work with. As a beginner dealing with non-textual file formats, I believed it was a great option to use in our project.

How is data stored?

As we already know, all the information is stored in a combination of 0s and 1s. To an unfamiliar audience, this feels impossible to read. However, once accustomed to the “structure”, it is not as difficult as one may make out to be.

Every binary file has a “header” section and a “data” section. The header describes the rules of the header files. In the case of a .wav file, the various header fields indicate important attributes of the file, such as the number of bits that constitute a sample, the length of the data, and the number of channels in the audio (usually 2 channels — left and right).

Reading headers from a binary file

My first job is to read this header information. To know which bits indicate what attribute and its value, we refer to a table such as below:

This information has been taken from another blog published in TrueLogic

Note: After careful debugging, this may only sometimes be the case. There were some instances of .wav files whose header information exceeded 44 bytes. For now, let us assume that the header is always 44 bytes long.

Based on the header table above, we can create a struct that looks like the following. For this article, I will be using C to demonstrate my code.

typedef struct {
char riff[4];
uint32_t file_size;
char format[4];
char riff_format[4];
uint32_t format_size;
uint16_t audio_format;
uint16_t num_channels;
uint32_t sample_rate;
uint32_t byte_rate;
uint16_t block_align;
uint16_t bits_per_sample;
char data[4];
uint32_t data_size;
} wav_header_t;

And it is easy to read the data with a simple function call

wav_header_t* header = malloc(sizeof(wave_header_t));

// read the size of wave_header_t 1 time and store it sequentially into
// the address given by header*
fread(header, sizeof(wav_header_t), 1, wav_file);

Reading data from a binary file

Once you’ve read all the headers, we can make some sense of the available information to read the data. From the table, note that bytes 37–40 should say “data, " which marks the beginning of the “data” section of the file. The following 4 bytes tell me the size of the data section. Making use of this data, I can easily read the data from the waveform file.

wave_header_t* header; // initialised and read
uint8_t* audio_data = malloc(header->data_size);

fread(audio_data, header->data_size, 1, wav_file);

What sense do the data bytes make?

In the headers, we have a field bits_per_sample. Let’s consider that this is 24. Every 24 bits in this waveform file make up a sample. A sample is a tiny unit of time in the audio file — you can use fields in the header to calculate the period of a single sample — Subsequent 24 bits are the subsequent samples in the audio file. This sequence of 24-bit numbers makes up a “wave”.

Source

The sequence of numbers, when visualized on a graph, looks something more complex than the picture above.

The wave is a superposition of multiple waves of different frequencies and their corresponding amplitudes (volume).

We would require to split this wave into smaller waves that make up the original wave and record the frequencies and their amplitude. But how do we take this wave and divide it into its frequencies? Fourier Transform!

Using the FFTW3 library

Fourier Transform (FT) is precisely the tool we can use to achieve our desired goal. The Fast Fourier Transform (FFT) is a famous algorithm in Computer Science that can take an array of numbers and output the result of performing the Fourier Transform on the numbers. FFTW3 has amazing tutorials on how to use it.

#include <fftw3.h>

fftw_complex* apply_fft(...) {
fftw_complex* in = fftw_malloc(100); // size of input array
fftw_complex* out = fftw_malloc(100); // size of output array

// using a specific type of FT: 1 dimensional Discrete FT
fftw_plan* plan = fftw_plan_dft_1d(...);

// populate the "in" variable
fftw_execute(plan);

// now the "out" variable has the result of FFT on "in"
// cleanup the memory that is no longer in use
return out;
}

Conclusion

Reading Binary files is not that different from reading text files. It is important to locate the correct information regarding how the file is structured and how you can interpret the format of the data that are encoded in it. One would not know the complexity of the program without researching it and implementing it hands-on.

I hope you enjoyed reading my article and learned something. Thank you! Love what I do?

Consider Buying me a coffee!
Want to connect?

My GitHub profile.
My Portfolio website.

--

--

Nishant Aanjaney Jalan
CodeX
Editor for

Undergraduate Student | CS and Math Teacher | Android & Full-Stack Developer | Oracle Certified Java Programmer | https://cybercoder-naj.github.io