4.9% WER

Pichai of Google boasted at Google I/O about Google’s speech recognition performance of 4.9% WER (word error rate) for general language. This is definitely a very big fete and while it’s not as impressive as Microsoft’s announcement in September of surpassing human transcription WER, that test was based on a constrained vocabulary. Google’s number is much more representative of real interaction.

However, even with this great WER, voice dictation for emails and text messages is still not reliable. In fact, it can sometimes be more of a distraction as people need to proofread the transcription and go through an awkward correction exercise. If voice is to be used as an aid during driving, correcting missed transcriptions can be terrifyingly dangerous.

A few years back, we experimented with an app called UbiSPEAK (and later SpeakChat). The beauty was that it combined STT and shipped it with a link of the original recording. This is similar to how some voice messaging services work today. The beauty was that even if the transcription was nonsensical, the recipient could still listen to the original recording to decipher.

This feature should be the standard for all transcribed / voice typed messages. Google would do a great service if it appended the dictation with a link to the recording (or at least provided that option). We could then tolerate a lower WER.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.