Speech recognition and translation in the browser

Kim T
Creative Technology Concepts & Code
4 min readJan 15, 2019
Speech translation is possible!

Many people dream of a world where we can all talk together seamlessly without language-barriers. There are several companies working on their own approaches which involves detecting a language, translating in real-time and then dictating back in the new language. With sentences of different lengths, and words taken out of context, it’s an extremely difficult task.

Here I’ve created an approach which works in Chrome browsers today, which although is not realtime, it feels seamless, natural and works inside chat conversations very well!

  1. Transcribe your voice

Using the SpeechRecognition functionality we can detect the words being spoken from a number of supported languages, and have those words stored in a variable or outputted to the page.

var speech = window['SpeechRecognition'] || window['webkitSpeechRecognition'];
var recognition = new speech();
var recognizing = false;
recognition.continuous = true;
recognition.interimResults = true;
recognition.onstart = function() {
recognizing = true;
console.log('onstart');
}
recognition.onerror = function(event) {
console.log('onerror', event);
}
recognition.onend = function() {
recognizing = false;
console.log('onend');
}
recognition.onresult = function(event) {
for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
console.log('onresult', event.results[i][0].transcript);
}
}
}

What’s great is that it works in real-time to constantly update the context of words spoken within a sentence. The main limitations are that only a few languages are supported, the user must allow the microphone, and they must select the language beforehand. The result is a text string matching what was spoken.

2. Translate to another language

We can use the Google Translation API to translate the text to another language. Be sure to replace key=X with your own API key from Google Cloud Console.

var url = 'https://translation.googleapis.com/language/translate/v2/';function load(url, callback) {
var xmlHttp = new XMLHttpRequest();
xmlHttp.onreadystatechange = function() {
if (xmlHttp.readyState == 4 && xmlHttp.status == 200) {
callback(JSON.parse(xmlHttp.responseText));
}
}
xmlHttp.open('GET', url, true);
xmlHttp.send(null);
}
function translate(text, langInput, langOutput, callback) {
var params = `?q=${window.encodeURI(text)}?&source=${langInput}&target=${langOutput}&key=X`;
load(url + params, function(response) {
callback(response.data.translations[0].translatedText);
});
}
translate('Hello', 'en-US', 'fr-FR', function(result) {
console.log(result);
});

The downside here is that you need to know the input and output languages beforehand, and requests are rate limited (costs for more). The result is a translated text string in your desired foreign language.

3. Dictate the translation

We can use the browser built-in SpeechSynthesis functionality to read out the text via your computer audio speaker, it supports multiple voice types and languages!

var voices = [];
var currentVoice = 0;
window.speechSynthesis.onvoiceschanged = function() {
voices = synth.getVoices();
}
function speak(text) {
console.log('speak', text);
var utterThis = new SpeechSynthesisUtterance(text);
utterThis.voice = voices[currentVoice];
utterThis.pitch = 1;
utterThis.rate = 1;
synth.speak(utterThis);
}
speak('Hello, my name is Kim');

One drawback is that you need to select the voice and language before, and it does struggle with some particular words. The result here is the audio from the speaker of a voice saying a sentence in the foreign language.

When you combine these steps together, you have a nice little speech app, which allows you to speak in one language, and after a small pause of around 1 second, speaks the translation back to you.

View a working demo here:
https://kimturley.co.uk/speech-translate/src/index.html

You can grab the full source here:
https://github.com/kmturley/speech-translate

If you combine this approach with a realtime chat messaging application you can do some even cooler things.

Everyone in the chat can speak in their own language, all text us transcribed and translated to English and outputted into the text chat. Each user can then hear the text read back to them in their own language!

--

--

Kim T
Creative Technology Concepts & Code

Creative Technologist, coder, music producer, and bike fanatic. I find creative uses for technology.