Why Your Speech Recognition Team Needs More Linguists

Krzysztof E. Borowski
Voice Tech Podcast
Published in
3 min readNov 14, 2019

In the #VoiceFirst era, you can’t go wrong with more linguists on your team.

Photo by JR Korpa on Unsplash

What is the last frontier in automatic speech recognition (ASR)? Better hardware, faster computers, more data? According to NYU’s Michael Picheny, it is none of these things. Instead, speech recognition systems struggle with recognizing multiple languages, registers, and understanding various accents. This is where linguists with an intimate understanding of how language works are especially useful. With their training and expertise, they are uniquely positioned to leverage that knowledge to build better products and improve overall customer experience.

Not long ago, I was in a conversation with a language technology professional. At the time, she was working with Russian data and wanted to know whether Polish, like Russian, has grammatical cases (cases are different forms of the same word that vary depending on the grammatical context in which they occur). I said yes and replied that while Polish has theoretically seven cases, in practice this number goes down to six. Intrigued, my interlocutor wanted to learn more. I explained that the use of the enigmatic seventh case called vocative is socially conditioned and depends on the type of situation in which speakers are involved. In a nutshell, Polish speakers use the vocative form to address people, animals, etc.

In the field of natural language processing (NLP), this immediately leads to the question of when we should train speech recognition systems to use such forms. The simple answer is: it depends. In essence, this is not an ASR, but a user experience (UX) question. So, if your goal is to build a Polish-language NLP model with a component of formality baked into it, then I recommend you include vocative forms in the training data. However, if you want to establish an emotional connection between the customer and their voice assistant, for example, it’s a good idea to avoid the vocative altogether. In the second scenario, adding some informal language to your model can go a long way in creating a smoother human-computer interaction (HCI).

Build better voice apps. Get more articles & interviews from voice technology experts at voicetechpodcast.com

In conversations like this, it is normal for industry professionals to look for ways to streamline the product development process by evaluating the complexity of the future language model. For linguists like myself, it is natural to consider the complexity of language in general, which necessarily makes us think about language variation and context-dependent usage. This also means that we’re naturally wired to identify and deal with cross-linguistic reasoning or language ambiguity, which machines cannot readily grasp yet.

To process large quantities of multilingual data, your NLP models must move beyond English.

Even though each language is a rule-based system with a considerable degree of variation, it relies heavily on paralinguistic signals beyond the main message. Growing up and socializing, humans learn to interpret these signals and to extract the intended meaning or to translate communication into action. This is precisely where the human mind has a major advantage over the most efficient computer systems. As of now, computers still cannot process natural language as effectively as the human brain.

Humans are good at language identification and interpretation, but machines can process more data.

As linguists, we spend hundreds of hours analyzing what people say, why they say it, and what the implications of their speech are. We also understand how language as a system operates on the most abstract level. Therefore, we can theorize and explain which linguistic forms fit your speech recognition or voice assistant model best. Whether you’re working on closing the human-computer gap in natural language understanding (NLU) or designing your next big product, consider adding a linguist (or two) to your team. With our aptitude for system-wide thinking, high level of abstraction, and strong attention to detail, we’re ideally equipped to launch your language technology to the next level. And this is why you should #HireLinguists.

Linguists, and particularly sociolinguists, have a lot to offer in the language technology sector.

Let’s talk! Are you a linguist or language specialist working in the tech industry? Or perhaps you’re an NLP/NLU/ASR professional who had to tackle some of the problems mentioned in the post? I would love to hear your story. Join the conversation now.

Something just for you

--

--

Krzysztof E. Borowski
Voice Tech Podcast

Lecturer in Polish Studies at UW–Madison. Scholar of political discourse, contemporary Poland, Silesian identity, online communities, and Polish television.