OK full disclosure, the title should probably be “A failed Web Audio experiment…”. Never mind, what follows below is a journey through learning the Web Audio API and attempting to make a nifty little app. I succeeded on all counts with the exception of producing the app.
Last things first, the app in all its shoddy glory. The objective was to detect which piano key is being played.
The original plan was to make a webapp for practising sight reading sheet music. That’s where you look at a piece of music and can immediately pump out da tune on a keyboard. The app would show a note, you would play the note, and the app would tell you if you got it right. Or play sad-trombone if you got it wrong.
This started with attempting to detect the pitch of a note, and finished abruptly because I got a job and had to stop aimlessly messing around with APIs all day.
Step one was to get my hands on the sounds coming from the user’s microphone. I thought this was going to be some giant pain and involve learning some complex API. I was delighted to find that it’s a piece of cake.
The audio context
You may have worked with <canvas> before and know that the ‘context’ is a reference to the canvas that you use to pass instructions to the canvas. Well it’s the same with audio. There is an ‘audio context’ that is at the center of everything you do with the audio you’re working with.
Naturally, the next step was to create an audio context. Then I could tell the context that its source of audio would be the stream from the user’s mic. Then I’d connect a thing called an ‘analyser’ which would give me some data about the audio. Sounds complex, right?
That’s pretty cool. It almost reads like English. “Hello audio context, I’d like to create a media stream source for you to listen to, here’s the stream. I’d then like to connect an analyser to that stream.
Next I needed to turn that rather abstract ‘stream’ into something more useful, like an array of numbers.
The analyser is my little helper, listening all the time to the audio coming in from the mic, and I can ask it to give me a snapshot of what it’s hearing whenever I like. But first I ask the analyser how much data it will give me each time I request it (analyser.frequencyBinCount) and create an empty array of that length.
Then, I wait…
…for 300 milliseconds; just to let the page warm up.
A this point I had no idea what the analyser would actually give me. So faced with either reading the documentation or dumping it to the console I did what any sane person would do.
This shocks a lot of people when I mention it, but I understand things better if I can visualise them. So I’ll loop through the array and spit it out onto a canvas.
The output is this squiggly little fellow:
If you’re not familiar with audio stuff, it’s actually quite simple. Here’s three bullet points:
- The Web Audio API is recording at 44.1 kHz. That means it’s recording 44,100 points of data every second. (Hz just means “times per second”)
- When I ask my friendly analyser for some data, it gives me an array of 1024 numbers (about 23 milliseconds’ worth).
- The ‘pitch’ of a note is defined by how many waves there are in one second.
Let’s do some math! Here I have measured the length of one wobble.
It’s 87 whatevers long. There are 44,100 whatevers in a second. Which means there would be 507 of these wobbles in one second (44100 / 87). In other words, the pitch of this sound is 507 Hz. All quite simple.
I Googled ‘piano key frequencies’, looked up the chart, and saw that 507 Hz is close enough to a B4. I then promptly fell off my chair, so rarely do I get things right on the first attempt.
Truth be told the note was a C sharp, not a B, but three notes off ain’t bad and my threshold for chair-disembarkment is quite low.
But rest on my laurels I must not! I didn’t want to measure just one wobble, I wanted to sample a whole lotta wobbles. Then I would periodically work out what the average pitch had been and render that somehow (actually, the mode proved far more useful than the average — averages are for schmucks).
As it turned out, when measuring the length of a wobble, it was better to measure the point where it crossed below the midpoint, rather than try and measure a ‘peak’. There were frequently double peaks (and double troughs), but rarely double crossing over of the line.
The values that the analyser provides are between 0 and 256, so that mid point is 128, the crux of it is line 25 in the below.
At the end you can see I’m rendering a key four times a second.
This involves taking the mode (the most common one) from the pitch samples (and emptying the array). It’s not at all relevant to the Web Audi API, but for the record, I didn’t use a plain array like in the code above, I used what I have arrogantly called SmartArray.
When I had a pitch (e.g. 507), I then look this up in an array of pitches that maps to a piano key (a copy/paste from the wikipedia article linker earlier).
So then I would know the key, but I still need a piano to draw it on. I did this with SVG because SVG is great. It’s rendered by looping over the array of keys and shifting left or right a little bit and making the key black or white.
The result of all this is a cute little keyboard (I had never noticed that the black keys aren’t right in the middle before I tried to make one).
Each of those keys has its own reference so to make a key light up I just do…
I’ve left out a few details here and there, if you’re keen to see some pretty sloppy code, the full source is on github. Beware that there’s some new stuff being used and I haven’t bothered with vendor prefixes so I think you’ll need Chrome 53 (Canary as at August 21) or Chrome Dev on Android if you want to see it in action: sight-reader.herokuapp.com
And that, as they say in the sandwich biz, is a wrap.