Human Transcription Services vs. Automatic Speech Recognition (ASR)

Published in

takenote

6 min readApr 2, 2019

There are a vast number of transcription services on the market, and they don’t all have the same costs, accuracy rates or guarantees. Some of these differences are subtle, but others are huge.

The introduction of technology to enhance the transcription process has created a gulf within the category ‘transcription services’ that is critical to understand if you are going to get an outcome suited to your transcription needs.

Technology has become so powerful in modern times that it is sometimes hard to imagine that ‘humans’ are better at anything. But, nothing could be further from the truth when it comes to transcripts.

Automatic speech recognition (ASR) software has enabled the delivery of unimaginably low prices for transcription services. However, there is still a long way to go when it comes to quality assurances and reliable outcomes.

This is your guide to understanding the difference between ASR and human transcription services, enabling you to make the right choice when it comes to your transcription needs.

How Does ASR Work, Anyway?

When it comes to using software to convert speech into text, there are more steps involved than you might think. Your voice creates vibrations when you speak, and these are converted into digital signals by the ADC — an analogue-to-digital converter. This is a tool that samples the sound, taking detailed measurements of the sound waves. There is a filter in place that distinguishes the relevant sounds and frequencies.

Next comes a little science. The signal is segmented into hundredths or even thousandths of seconds and the detailed sound measurements are matched to templates related to the 40 phonemes in the English language. The individual phonemes are then examined and evaluated in the context of other phonemes around them, and then the system will run the network of phonemes through a mathematical model to compare them to sentences that are well-known, along with individual words and phrases.

Text is then created based on a best guess regarding what the person has said, which is then presented in ASCII characters — standard text.

Automatic Speech Recognition (ASR) Allows for Much Cheaper Transcription Rates

The reason that companies are aspiring to deliver effective speech to text software solutions is cost. Being able to cut the human transcriber out of the loop allows for the delivery of a much cheaper service. Currently, ASR transcription services offer a minimum of a five-fold cost saving over human transcription services, and realistically, you will often be paying ten-to-twenty times less when you opt for an ASR solution. That is not even to mention that several large ASR services are entirely free.

Paid ASR services give you access to slightly more robust results and features like encrypted storage that are important to those with sensitive files. Some of the services out there charge per minute while others have set fees that usually cap the total amount of audio files that you are allowed to upload each month.

You can expect to pay in the region of £0.07 — £0.10 per minute of audio for an ASR service. Human transcription services charge in the range of £0.50 — £2.00 per audio minute.

Automatic Speech Recognition (ASR) Falls Down When it Comes to Accuracy and Quality Control

ASR has some advantages, there isn’t denying that. But, it still cannot come close to matching the quality of human transcription services. This might change in the future, but ASR currently struggles to decipher accuracy rates higher than 80%, even under ideal circumstances. That figure only gets worse in the presence of background noise, accents, poor quality audio files or even just multiple speakers. It is not unheard of for an ASR solution to produce completely unintelligible outcomes when faced with a recording that, to a human, would be entirely understandable.

The bottom line is that when using ASR, you need to count on spending some of your own time editing the transcript and be okay with losing some of the detail and meaning to errors made by the program. Think about every misunderstanding you have had speaking with Siri, Alexa or Google Assistant. Now imagine that you don’t have the chance to try again. The technology is good, but it still isn’t seamless.

You will also have to accept no options when it comes to formatting or the level of detail captured — bringing us to the next point about why people use human transcription services even though they are more expensive.

Human Transcription Services Also Deliver Transcription Options

ASR can only deliver a best effort at a verbatim transcription. As we have discussed, an ASR transcript will not fully match a recording, delivering an insufficient outcome when every detail matters. Luckily, every single word is often not that important. The problem with ASR, in that case, is that it delivers too much detail, with the missing sections dictated by the nature of the audio file, not strategic omissions made by a transcriber to improve readability.

Human transcription services deliver three main ‘levels of detail’ options that allow you to get the best transcript for your transcription needs.

Word For Word Transcripts

We all pause and stutter while we speak, especially during dictation or meetings when getting the point across. A text that is transcribed verbatim will include each stumble, each cough and each pause on the file. Our brains are good at filtering a lot of this out while listening to someone speak, but it can be very hard to read. Also known as ‘intelligent verbatim’, word for word transcripts edit out those ‘umms’, stutters and repetitions to deliver an edited transcript that still contains all the important details and close to verbatim accuracy in an easier to read format.

Verbatim Transcripts

Verbatim transcripts from human transcription services deliver what one would hope that ASR transcripts eventually become capable of delivering — every single detail. Human transcriptionists can actually go even further, providing notes on tone, laughter and pauses. When compared to intelligent verbatim, however, this level of detail does often cost more

Summary

Lastly, there are summary transcription services. This will not capture every word or detail of the audio file but simply give you the important points of a conversation. These are cheaper per minute of audio, which could be fantastic on a budget. But, are only useful if details don’t matter. However, sometimes that is exactly what you need if simply trying to get through a load of information quickly.

ASR Has Its Place, But Human Transcription Services Are Still The Quality Choice

The best transcript for you will depend on why you need something transcribed, your budget, how much time you have to spend editing the transcript and the nature of your recording.

High-quality recordings are a must with ASR. If you want to use ASR transcription services, it is worth investing in the highest-quality recording possible. That means purchasing a dedicated and quality dictaphone, recording in a quiet room and attempting to minimise the degree to which people talk over each other. Even if you can achieve this, expect to spend some time cleaning up the transcript.

There is a lot of investment pouring into automatic speech recognition. It is entirely possible that within the decade, technological advances will open the door to high-quality, automatic transcription services — massively depreciating the baseline costs in the industry. However, for now, if you want a quality outcome, you need human transcription services. It will cost more, but you will also get choices on the level of detail included in the recording and, ultimately, get a transcript that you can count on meeting your transcription needs.

You have been reading about how human transcription services compare to automatic speech recognition (ASR) software. If you want to learn more about how to pick the right transcription service, including the intricacies of security conscious and industry-specific needs like legal transcriptions, medical transcription and market research focus groups (along with how to get the best deal possible on human transcription services!), we have written an Ultimate Guide to Transcription Services just for you.

Originally published at info.takenotetyping.com.