Just in time for Christmas!
The spacet.me Labs is proud to present…
Carols in Harmony, a web-based application, lets you practice singing famous Christmas hymns in four-part choir, composed of Soprano, Alto, Tenor, and Bass voices. Using Google Chrome or Firefox on your desktop, select the song you want to sing…
The notes will be displayed on-screen. You can choose which part to listen or practice. Click on the blank area to play, and a synthesizer will start playing the notes.
In listen mode, the application will play your part. In practice mode, the application will play the other parts and let you sing your part.
If you allow the app to access your microphone, it will also listen to your voice and shows whether you sang the correct note or not. If you sang correctly, the note will become green.
Unlike some other singing games like The Voice, in Carols in Harmony you also have to sing the correct octave.
As with other experiments of spacet.me Labs, this application is free and open source. You can check out its source code on GitHub. The rest of this article will describe how it works and how this application is built.
The Human Voice
When we sing, we emit sound waves, which is heard from the microphone. Here, I just tried to sing the Bass C note (C3) into the microphone:
Above is a sample of how my voice looks like over a period of 0.0464 seconds. Fortunately, just that is enough to determine which note and which octave I am singing!
I’m not a math expert, but speaking mathematically, you can say that it’s a function of time — f(t).
One amazing fact is that this kind of wave function can be decomposed into a a sum of multiple sine waves of different frequencies and amplitudes — effectively resulting in a function of frequency, f̂(ω). This GIF explains it best:
Using a Fast Fourier Transform algorithm, I can turn my voice into frequency domain. Now, here’s how it looks like.
Usually, sounds are composed of a fundamental frequency, which is the lowest. You can see above that the fundamental frequency is near the C3 note.
Now, there are more. The first harmonic is 2 times the fundamental frequency. The second harmonic is 3 times the fundamental. The third harmonic, 4 times. And so on.
For comparison, here’s me singing the Treble C note (C5, two octaves above the Bass C):
To implement that in web application, we have to use a combination of:
- getUserMedia API to gain access to the microphone.
- Web Audio API to perform sound analysis. It is already capable of doing FFT analysis using an AnalyserNode.
So all we have to do is to wire them together, and find some algorithm to detect the fundamental frequency from the FFT analysis result:
The algorithm I used is simple and naive: I just iterate from A0 to C7, collecting the amplitudes of fundamental frequency and some harmonic frequencies. Then I choose the note that has maximum combined amplitude. The code is in microphone.js.
The notes to sing are loaded from a MIDI file, loaded via Ajax. The MIDI file contains notes for each voice on each channel, ranging from Channel 1 (Soprano) to Channel 4 (Bass).
MIDIFile library is capable of decoding and extracting events from a MIDI file. These events are used to construct a list of notes, which is displayed on-screen, and is played through a sequencer.
Upon playing, a Sequencer is initiated with events to play. “Bass, stop! Bass, start playing C#3! Alto, stop! Alto, start playing G#3! …” The sequencer would command the Voice objects as the song progresses.
Each Voice object in turn receives start and stop commands, and controls a simple synthesizer (just simple oscillators) to play sound on the audio device. For simplicity, two GainNodes are used. The first one is for automating the volume when note is on and off. The second one is for controlling the voice’s volume when switching mode.
The User Interface
The welcome page is built with jQuery, since it is a very simple view, and does not need complex data binding.
The lyrics view is built with React. Because the user interface is quite complex (need to be able to switch between languages and verses). With React, I just read the state of the application (available languages and verses, current language, current verse, lyric texts) and transform them into Virtual DOM nodes. React then manipulates the DOM on the page efficiently to make it match the given Virtual DOM.
The on-screen notes display was first built with React, but it is too slow for this use case. I then replaced it with my own lightweight DOM rendering library. It is custom built so that React components can be easily ported. I hope someday I will be able to elaborate more on that.
Package Management and Build Tool
In another project, I tried webpack, a module bundler. It integrates nicely with npm modules and ES6 (using 6to5-loader). However, the script must be pre-compiled into a bundle before it can be used, but it has very good debugging support, and comes with a very nice build tool and development server. So now I like webpack better.
Building this project not only helped me to train my voice for this year’s Christmas Caroling, but also let me explore about sound analysis and taught me about building applications using ES6 for running in today’s browsers.
This project is probably the last experiment of spacet.me Labs in 2014. There will a big project coming up next year.
Have a Merry Christmas, and a Happy New Year. May God bless you and give you hope throughout 2015!