Geek Culture
Published in

Geek Culture

Web Audio API and computer audio learnings

Intention of Article

This would serve as a note to my future self on all learnings on computer audio and web audio API. I would try update the article if I got any new learnings.

I would try to make it a Q and A so relevant questions I asked myself would be answered (by myself) here.

General questions

How web audio api works

The web audio api construct graph of nodes between source (sound input like microphone, web video/audio tag, or audio stream) with some manipulation function (audio nodes) and finally reach the destination (e.g. speaker)

Image from https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API

Questions refer to analyser

How it works?

The function call from example is very straight forward, the up stream node would pass the sound signal into the analyser node we can get time domain or frequency domain data.

So what’s fftsize?

FFT is the short form for Fast Fourier Transform, the size would control how many output points. The documentation explain the effect/result vaguely as follow:

from https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Visualizations_with_Web_Audio_API

From my understanding, the fftsize is the target bin size (or should it be the number of sinusoidal wave frequency?) the Fast Fourier Transform (FFT) would apply on the signal.

And according to source code for chromium implementation, the actual core code is at the realtime_analyser.cc, the getFloatFrequencyData or getByteFrequencyData API refer to function of same name in the C file.

The main call is DoFFTAnalysis(), which the logic is “roughly”:

1. copy data from input buffer to temp buffer for FFT
2. call ApplyWindow(buffer, fft_size) # this is to transform the data with a Blackman window to facilitate FFT
3. call DoFFT(...) on the buffer
4. Take the return from FFT, combine the real and complex number into magnitude and scale it

So why the final bin size is half of the fftsize?

I believe it’s due to the Nyquist limit, which anything beyond 1/2 of the sampling frequency are just repeating (?) and being redundant (this also might be related to FFT treat the input sample signal as infinitely repeating wave).

Why the code cannot see the actual FFT implementation?

I figured that, the line that try to perform actual FFT operation is platform specific and rely on other libraries

This line is the line calling the DoFFT function

When we examine the folder

These different folder have their own implementation detail for the analysis_frame (I “believe” the core implementation is pffft, but I am not 100% sure)

Below is the pffft implementation, and we see len = fft_size / 2 is implemented here:

What’s a Blackman window?

I believe these 2 videos (video1, video2) would give a better explanation then I do. My limited understanding is this help the FFT to reduce leakage (and produce a better result to capture the frequency more precisely)

Note that Blackman windows is just 1 type of windows among different types.

screen capture from video: https://www.youtube.com/watch?v=Q8N8pZ8P3f4

What’s the difference between getByteFrequencyData() and getFloatFrequencyData()?

The get byte function do the same operation as get float function, just adding a scaling (to 0–255) before return.

Here I have a question to myself: when I inspect value from the float return, the values are all -ve decibels range from -20 to -190, how come the byte conversion works (as min_decibels_ is with default value = -100 and max decibel is default to -30 <= these are written in realtime_analyser.h)

Compare to above, it just add the lines after the call to audio_utilities::LinearToDecibels(…);

How big or what is the size / unit of a bin returned by the analyzer?

The bin size, as mentioned above, is related to fftsize, so if we are taking 2048 as fftsize, we are going to get back 1024 (2048 / 2) numbers (in decibel or in byte), and each of them correspond to amplitude of signal of particular range of frequency, and the range is related to our sampling rate.

If we are sampling the audio in 44100 Hz, then each bin / data represent the amplitude of signal at “sampling rate / fftsize”.

In our example, which is 44100 / 2048, around 21.5 Hz, so the first data in the returning array should contribute to amplitude corresponding to 0–21.5 Hz, the 2nd value is 21.5–43 Hz…

And how much time are we measuring / sampling the signal?

For this one, I am not 100% sure, but as far as I understood, it’s related to frame size and sampling rate. The frame is where the “analysis_frame” in the code.

And the frame size is defined by the fftsize provided.

So we might be able to conclude that we are taking a frame of sample size = 2048 (in our example), and with sampling rate = 44100 Hz (sample per second), so we are analyzing the sound for 1 / 44100 * 2048 = 0.046 (second)

I am also referring to this online book section “Frequency and Time Resolution Trade-Off” (and following sections).

(More to go here…)

(Next item is visualization from MDN example)

References

Here are the references I read:

On coding and implementation

Official Web Audio API documentation — helpful up to certain degree, you would get the basic basic, with example codes they referenced (in github), you can make something already, but when I want to know why and how it’s implemented, we need to find other source

Source code of Chromium audio processing (or the original repo for blink webaudio)— Each browser implement the web audio API underneath support differently, this one is for Chromium (basis of Chrome), and from here I get to know how some API is being implemented and answered some of my “why” question

Video by Neil McCallion [I Play JavaScript: Making a Web Audio Synthesizer (Neil McCallion) ] — A fun to watch video to cover both coding and some music theory, the step by step explanation helps to complete the unknown from the documentation from MDN.

Video on building nice analytic [JavaScript Audio CRASH COURSE For Beginners] — Good for beginner level, the coding on web audio API is not too much beyond MDN documentation, but nice to see how visualization tie to the azalyser node output.

Video on making an interactive melody player [Melody Maker app using vanilla JavaScript, HTML Canvas and Web Audio API] — This is so much fun, another great example using web audio API.

Audio Deep Learning Made Simple Series — even it is more focused on deep learning, the first and 2nd article explain how audio data is being processed and understood, clear a lot of basics questions

On theory

WolfSound — well coverage of audio programming concepts and math, have a youtube channel if one prefer video

Music and Computers Book — a book with well coverage on computer music, with sound examples and tiny web programs to help illustrate the idea (still reading)

Mark Newman’s course on Fourier Transform — His explanation is way better than my university professors back then, worth to at least view the free videos

Physics of music notes from MTU — course notes for their class, might be useful (not completed reading yet)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store