Member-only story
because human brains solve the problem
Speech recognition isn’t just a sound to text problem. When we talk, our brain recognizes the sound, sure, but it also recognizes the words, parts of words, phrases and other relationships in context.
And yet today’s speech recognition systems treat the problem as a sound problem first, then a word selection problem and finally a word sequence selection problem.
The choice, given similar matches, is made based on the probability of the words and the word sequence selected. More probable sequences win over what potentially makes sense because meaning and knowledge are factored out of that (formal) science as discussed previously.
Isn’t Speech Recognition a solved problem?
Speech recognition, also known as Automatic Speech Recognition(ASR) or Speech to Text (STT) isn’t a problem of sound recognition, but of language recognition. Beautifully put by John R. Pierce of Bell Labs:
“… a general phonetic typewriter is simply impossible unless the typewriter has an intelligence and a knowledge of language comparable to those of a native speaker of English.”
In other words: dictation software needs language understanding.