Recording Audio on the Browser

Good o’l Audio

Browser is used for all sorta things these days. From simple web pages that render a news site to a complicated gaming web app that uses WebGL technology under-the-hood to render graphics. Audio recording on the browser, that shouldn’t be rocket science right? Well turns out, rocket science and magic combined can solve this.

Oooh Dulitha, what’s this magic you speak of?

First — I am going to assume that you don’t know the basics of audio (sorry about that). I will simplify things so that everyone can follow. We need to talk about Analog Signals and Digital Signals. A & D so to speak.

Analog Signals

When the Mic captures sound — what it actually does is that it captures sound energy and converts it to electric energy. This would produce the Analog signals. The problem is — computers are not that great with Analog signals. Computers ❤ digital signals. Signals that are either 1 or 0.

Analog signals that get captured by the Mic

Internally to the computer, there is a hardware piece called an ADC which is used to convert Analog signals to Digital. After this process— we can access the digital bytes of the sound.

I follow you, so now you have the bytes. What do you do after?

Next would be to capture the byte arrays we get so we can stitch them together for a track of sound.

Browser market share

Let’s take a quick look at the browser market share right now. Cause obviously not everyone is on the latest version of Chrome and some poor souls still live in the dark ages.

Why so many Chrome versions! Credit — Netmarket Share
Dulitha, why would browser market share matter?

The Browser provides the APIs we need to access the byte arrays. Web-Audio API and Web-RTC API are the APIs for the job. Get it, like the Italian job.

Dulitha, I am walking here!

So which browsers support Web-Audio- Chrome 14+, Edge, Firefox 23+, Opera 15+, Safari 6+ . But no love for Internet Explorer.

Web Audio API

Remember the pipe pattern that got recently famous. Largely due to Connect Middleware. Web Audio API uses pipes as well. This allows certain pipes to do interesting things to the input (increase volume, decrease pitch) and pass it on to the next pipe.

Web Audio in block diagrams

Now to see some actual code doing what I preached earlier.

WAV and a little about losseless

Your audiophile friend mentions many times about AAC file format and losseless music which you have no clue about. Above code captures losseless music off from the byte arrays. For us to listen to this from the browser — we need to first convert the byte arrays to WAV encoding. WAV encoding allows the browser to recognize how to play back the bytes that are captured. The WAV encoding code is a bit complex to follow but below is how we use it.

recordingStream.getTracks()[0].stop();
var audio = encodeWAV(buffers, bufferLength, audioContext.sampleRate);
const audio = document.createElement('audio');
audio.src = window.URL.createObjectURL(audio);
audio.play();
Dulitha, that wasn’t that complex. Where is the rocket science
Waveform

Well, now comes the interesting part. The size of a 1 minute wav file would roughly be about 10mb+. Yes, you heard it right, for ShortKast when we first built the web recorder, a five minute clip was over 50mb+.

MP3 for the rescue

So who came to rescue us from making users upload a 50mb file to our server? MP3 encoding! We can compress the sound in the WAV file and create a much lighter MP3 file. An MP3 file of 5 minutes goes for about 2 to 5mb in size. Isn’t this ideal?

Now you are going to tell me another problem you hit right?

Yep, the way to transcode MP3 in the browser is to use LAME mp3 conversion. There is a github project that is written completely in javascript that does the conversion. What’s the catch? It’s slow. It takes about 2 minutes to convert a 5 minute file.

Users these days can’t wait for 2 minutes.

The transcoding had to happen while we record. Now that’s again a complex operation cause the javascript in the browser runs in a single thread. If we transcode realtime — we are going to miss a lot of samples from the mic since our thread is blocked inside LameJS.

Next week, you’ll get to hear about how I overcame this issue. If you like this post, please spread the love. Also do checkout ShortKast, a mini-podcasting social network that uses above tech to record audio.

Like what you read? Give Dulitha Wijewantha a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.