4 Applications of Artificial Intelligence (AI) for Voice Transcription

Karl Utermohlen
3 min readMay 23, 2018

--

Voice transcription is a craft that requires years of experience to fully capture the nuances of different accents and language quirks that people have. Moreover, languages with different dialects often require natives from a particular area to transcribe words into plain English or another language. Artificial intelligence (AI) has now caught up to this craft by offering assistance in transcription work.

While the technology will never replace humans, it can assist us in our transcription work by offering us robots that do part of the work, which we oversee and edit for errors. Recent advancements in natural language processing (NLP) have made it easier for devices to transcribe spoken word sound clips as they can detect unique characteristics of a language, spanning multiple areas around the world, regardless of size.

Intelligent automation company WorkFusion offers a robotic process automation (RPA) platform called RPA Express that can churn out smart voice transcription solutions by automating transcribing tasks.

Here are four applications of AI in voice transcription that are currently making waves:

1) Dictate and Voice Typing

Microsoft’s Artificial Intelligence and Research team developed an AI solution that can mimic human transcribing work with a 5.9% error rate, the same error rate that professional transcribers have. The company has rolled out the technology and integrated it into Cortana and Xbox consoles. Microsoft also created Dictate, a product that allows users to type in Outlook, Word and Powerpoint by speaking. Voice Typing is another Microsoft solution that was applied to Google Docs, which also records a voice message and transcribes it into a text document in real time.

2) Amazon Transcribe

The e-commerce giant’s foray into the tech world has been a success and Amazon Transcribe is also making headway as a voice transcription solution that is propelling the industry forward. The AI product is an API available through Amazon Web Services that can transcribe English or Spanish. It also offers timestamps on words to check transcription accuracy, while also working with phone audio. The service will soon offer support for multiple speakers and it is affordable at $0.0004 per seconds.

3) Otter

AISense developed an app called Otter, which was created by former Google employees and speech-recognition giant Nuance. Otter can transcribe speech on the go with AI and it is one of the most accurate transcription services out there and it is free to use. The app uses speech recognition algorithms similar to the ones on digital voice assistants such as Siri and Alexa. The use interface is easy to use and intuitive as it asks you to do a short or long recording. Otter asks you to do so through the app’s mic icon, which learns your voice’s nuances in order to identify you in the recordings you make. This makes it easier to know where the recordings come from, but the app is far from perfect as it has punctuation issues.

4) Deep Speech

Chinese tech company Baidu has been developing AI solutions for voice transcription as well with its Deep Speech project. Most world-class speech recognition systems only work with user data from third party providers or with graduates from the world’s top speech and language tech programs. Baidu Research developed a speech recognition system that can be built, debutted and improved by a team with little to no experience in the sphere. This system is simplified and easy to use, offering high-quality transcription services in a number of languages.

--

--

Karl Utermohlen

Tech writer focusing on AI, ML, apps and cybersecurity. MFA in Creative Writing from the U of Idaho. Writes for PSafe, Upwork, First Page Sage, WeContent, IP.