How to choose the best audio transcription service: 5 questions to answer

SpeechText AI
Published in
3 min readJun 22, 2020

Have you ever tried to convert an audio or video file into text? That can be a time-consuming process to do manually. There are many cases where you may need to perform transcription tasks regularly, and keeping up with it on your own can be a challenge. However, selecting a transcription service is more complicated than you might think. There are a lot of AI-powered transcription services on the market, and they offer many distinct options. But how will you know what is the most suitable service for you?

Based on our experience of using Artificial Intelligence and Machine Learning technologies for speech recognition, we’ve collected five main questions to help you choose the right audio transcription service.

1. What type of speech recognition algorithms do they use?

Recent advances in deep learning have triggered the rapid development of intelligent speech recognition solutions. State-of-the-art neural network architectures are almost as accurate as humans and even outperform humans on some tasks. Today, it is difficult to create an accurate speech recognition system without using deep learning algorithms. If you are looking for a high-quality transcription service then, most likely, it should be based on deep learning techniques.

2. What level of accuracy do they guarantee?

Automatic speech recognition has made significant progress in the last few years, and, as a result, the quality of speech to text conversion has improved dramatically. The accuracy level of modern transcription software on clean audio should be at least 90%. To verify that, download a short audio file, transcribe it using the selected service, and calculate the word accuracy of results: the number of correctly recognized words divided by the total number of words in the human transcript.

3. What languages and non-native speaker accents are supported?

If a transcription service supports different languages, you don’t need to spend the extra money and find another service to reach a global audience. Also, it is crucial for speech recognition services to be able to deal with accented speakers. If a service doesn’t cover all major accents and dialects you can’t get accurate transcripts for audio files featuring non-native speakers.

4. What is the audio transcription price per minute?

No matter how large your budget is, everyone cares about money. But what is a fair price for automatic transcription services?

The transcription price consists of two main parts:

  • Speech recognition technology: computing resources to train and deploy speech recognition models.
  • Transcription as a service platform: user interfaces, built-in editors to correct speech to text conversion results, file storages to store user audio/video files, etc.

If you have a limited budget you should focus on a transcription software with custom speech recognition technology. There are a lot of online transcription services reselling Google/IBM/Amazon/Microsoft Speech to Text Services, and most of these products are overpriced: 1 transcription hour starts from 7$–10$. From our perspective, if the service is more expensive than $3-$4 per hour, it overcharges its customers.

5. What other essential features do they propose?

Proofreading interface. The best service for audio transcription should have editing tools to correct automatic transcription results.

Transcription features. It is important if the service can detect different speakers and transcription results include punctuation marks (commas, full stops, etc.).

Industry-specific models. To correctly understand industry terms and accents, a transcription software should focus on domain-specific transcriptions. If the service doesn’t have domain-specific models for your industry, the accuracy of your transcriptions will not be high.

By using these five questions to guide your search for the best automated transcription service, you can realize the benefits of Artificial Intelligence and avoid the common mistakes that are easy to make.

SpeechText.AI offers an accurate and reasonably priced transcription service with lots of online editing options and industry-specific speech recognition models. It supports more than 30 languages and non-native speaker accents.

