Recording audio with React for Amazon Lex

Matty Williams
5 min readOct 28, 2018

--

Recording audio in the browser and processing it with services such as Amazon Lex can be difficult. Audio formats often differ, and it’s good practice to provide support to many browsers, and fail gracefully on others (looking at you IE).

Just interested in the final code? You can find it here.

This article is an implementation of an Amazon tutorial in React.

https://www.videoblocks.com/video/audio-waveform-animation-simple-black-and-white-sound-wave-as-motion-background-4renwtog_lijee5bmg

Recording Audio

First thing we will do is use create-react-app to build our React playground, and use RecorderJS for a nice wrapper on the Web Audio API.

Once we’ve cleaned everything up a bit, we can begin on our Recorder Component. This Component needs to store access to the users microphone (more on this later), know whether we are recording something or not, and finally an instance of the RecorderJS class.

class Recorder extends Component {
constructor(props) {
super(props);
this.state = {
stream: null,
recording: false,
recorder: null
};
}
}

When our Component mounts, we immediately want to request access to the users microphone in their browser. We will use our getAudioStream() method to request access, don’t worry about this yet, just know that it asks the users for access to their microphone, and then returns a stream to this device.

async componentDidMount() {
let stream;
try {
// We will implement this later.
stream = await getAudioStream();
} catch (error) {
// Users browser doesn't support audio.
// Add your handler here.
console.log(error);
}
this.setState({ stream });
}

In our render method we want to either show nothing (if the users browser doesn’t support recording audio, a button to start the recording, or a button to stop it.

render() {
const { recording, stream } = this.state;
// Don't show record button if their browser doesn't support it.
if (!stream) {
return null;
}
return (
<button
onClick={() => {
recording ? this.stopRecord() : this.startRecord();
}}
>
{recording ? 'Stop Recording' : 'Start Recording'}
</button>
);
}

As shown on our buttons, we need to implement both the startRecording() and stopRecording methods. Let’s start with the first.

startRecord() {
const { stream } = this.state;
const audioContext = new (window.AudioContext ||window.webkitAudioContext)();
const recorder = new RecorderJS(audioContext);
recorder.init(stream);
this.setState(
{
recorder,
recording: true
},
() => {
recorder.start();
}
);
}

When a user hits start record, we create a new instance of RecorderJS, initialise it with our access to the users microphone, setting state accordingly, and then begin recording.

async stopRecord() {
const { recorder } = this.state;
const { buffer } = await recorder.stop()
const audio = exportBuffer(buffer[0]);
// Do your audio processing here.
console.log(audio);
this.setState({
recording: false
});
}

When a user hits stop, we stop recording and encode the wav buffer in the correct format.

Encoding our audio

The audio we record in the browser is not directly compatible with API’s such as Amazon Lex and Google Cloud. As shown below, Amazon only accepts certain audio types with specific sample sizes.

Amazon Lex API Docs

So now we have our Recorder Component, let’s implement the methods getAudioStream() that get’s access to the users microphone, and exportBuffer() that converts our wav audio buffer with a 44100 HZ sample rate, to one processed at 16000 HZ and encoded as PCM (pulse-code modulation).

I like storing my non react components in a different folder, so let’s create a folder called utilities and add audio.js to it. Here we will do our audio processing.

/**
* Get access to the users microphone through the browser.
*
* https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia#Using_the_new_API_in_older_browsers
*/
function getAudioStream() {
// Older browsers might not implement mediaDevices at all, so we set an empty object first
if (navigator.mediaDevices === undefined) {
navigator.mediaDevices = {};
}
// Some browsers partially implement mediaDevices. We can't just assign an object
// with getUserMedia as it would overwrite existing properties.
// Here, we will just add the getUserMedia property if it's missing.
if (navigator.mediaDevices.getUserMedia === undefined) {
navigator.mediaDevices.getUserMedia = function(constraints) {
// First get ahold of the legacy getUserMedia, if present
var getUserMedia =
navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
// Some browsers just don't implement it - return a rejected promise with an error
// to keep a consistent interface
if (!getUserMedia) {
return Promise.reject(
new Error('getUserMedia is not implemented in this browser')
);
}
// Otherwise, wrap the call to the old navigator.getUserMedia with a Promise
return new Promise(function(resolve, reject) {
getUserMedia.call(navigator, constraints, resolve, reject);
});
};
}
const params = { audio: true, video: false }; return navigator.mediaDevices.getUserMedia(params);
}

This code snippet (taken from here) gets the users media devices, and returns a promise that will resolve with a stream to these devices. It returns the audio stream for modern browsers, and tries to also support older ones.

Now onto encoding our audio into the correct format.

/**
* Samples the buffer at 16 kHz.
* Encodes the buffer as a WAV file.
* Returns the encoded audio as a Blob.
*/
function exportBuffer(recBuffer) {
const downsampledBuffer = downsampleBuffer(recBuffer, 16000);
const encodedWav = encodeWAV(downsampledBuffer);
const audioBlob = new Blob([encodedWav], {
type: 'application/octet-stream'
});
return audioBlob;
}

To encode the buffer we first downsample the rate of the original recording from 44100 HZ, which is what the browser records at, to 16000 HZ. Then we encode the new buffer as a PCM encoded wav audio file, and create an audio Blob with this data.

We downsample the audio…

/**
* Samples the buffer at 16 kHz.
*/
function downsampleBuffer(buffer, exportSampleRate) {
const sampleRateRatio = recordSampleRate / exportSampleRate;
const newLength = Math.round(buffer.length / sampleRateRatio);
const result = new Float32Array(newLength);
let offsetResult = 0;
let offsetBuffer = 0;
while (offsetResult < result.length) {
const nextOffsetBuffer = Math.round((offsetResult + 1) * sampleRateRatio);
let accum = 0;
let count = 0;
for (var i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
accum += buffer[i];
count++;
}
result[offsetResult] = accum / count;
offsetResult++;
offsetBuffer = nextOffsetBuffer;
}
return result;
}

And encode the wav in PCM format…

function floatTo16BitPCM(output, offset, input) {
for (let i = 0; i < input.length; i++, offset += 2) {
const s = Math.max(-1, Math.min(1, input[i]));
output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
}
}
function writeString(view, offset, string) {
for (let i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
}
/**
* Encodes the buffer as a WAV file.
*/
function encodeWAV(samples) {
const buffer = new ArrayBuffer(44 + samples.length * 2);
const view = new DataView(buffer);
writeString(view, 0, 'RIFF');
view.setUint32(4, 32 + samples.length * 2, true);
writeString(view, 8, 'WAVE');
writeString(view, 12, 'fmt ');
view.setUint32(16, 16, true);
view.setUint16(20, 1, true);
view.setUint16(22, 1, true);
view.setUint32(24, recordSampleRate, true);
view.setUint32(28, recordSampleRate * 2, true);
view.setUint16(32, 2, true);
view.setUint16(34, 16, true);
writeString(view, 36, 'data');
view.setUint32(40, samples.length * 2, true);
floatTo16BitPCM(view, 44, samples);
return view;
}

And that’s that, now we can use our newly encoding audio buffer with popular Amazon and Google speech to text services.

Want me to continue this series, and show how to build a real time speech to text conversation in React? Let me know!

Useful Reading

--

--