Speech-to-Text platforms facing a new competitor in Open AI’s Whisper — the battle for accuracy and efficiency heats up!

izwe.ai
4 min readApr 11, 2023

--

Written by the izwe.ai team

Speech-to-text platforms have been around for quite some time, and with the rise of Artificial Intelligence (AI) and Natural Language Processing (NLP), they have become even more sophisticated. Since the launch of Open AI in November 2022, the company has released a new innovative speech recognition system called Whisper. The advent of Open AI’s Whisper now means that speech-to-text startups may find themselves having to keep up with the latest advancements in this field, especially if they do not adopt the use of these new technologies.

While Whisper’s innovative speech recognition system utilises cutting edge natural language processing models for its transcriptions. It still does not compare with South African speech-to-text platforms such as izwe.ai that produces accurate transcriptions and translations for African languages. The accuracy of AI transcription systems, Whisper, is not perfect and can be impacted by factors such as background noise or accents whereas human transcribers have the ability to capture nuances in speech that AI may miss.

(Learn more about Whisper: https://openai.com/research/whisper)

Open AI’s Whisper vs Speech-to-Text platforms

Image by Open AI

OpenAI’s Whisper and transcription platforms are two different approaches to transcribing audio content. While both methods serve a similar purpose, there are distinct advantages and disadvantages to each approach:

  1. Domain and Audio Quality

Once we get language out of the way, domain and audio quality become super important. OpenAI’s Whisper uses artificial intelligence to transcribe audio content. This approach is faster and more cost-effective. Izwe.ai cater’s to various industries such as the legal, finance, call center and business sectors. Most of these businesses will have specialised jargon which a standard model will miss. Also, the quality of audio in some businesses like a call center could be quite terrible and have myriad artefacts. So, the model needs to adjust for that as well.

2. Language Support

Whisper is an innovative speech recognition system that utilises cutting-edge neural network models to accurately transcribe spoken language, even in noisy environments. The system also has the ability to adapt to individual speaking styles. However, if we look at language support especially in the context of African languages, most of its breakthroughs should be prefaced by saying for English, for Europe or for the West. There is support in Whisper for Afrikaans and kiSwahili but it’s at the lower end of the performance spectrum. This is why speech-to-text platforms like izwe.ai are important. Izwe.ai is an African AI company, and at its core objective is to cater to African languages such as Afrikaans, isiZulu and KiSwahili. According to statistics published on Business Tech online isiZulu the most spoken language in South Africa, and Swahili is an even more widely spread and spoken language in Africa. This is not only interesting to see how the continent’s indigenous languages compare to European languages, but it also means that localised speech-to-text applications that provide precise African translations and transcriptions of these languages are essential and needed in the market.

Whisper’s Github Word error rate OpenAI’s Whisper: https://github.com/openai/whisper

3. Diarisation

Finally, something which can get left out is diarisation which is knowing who is speaking when. For podcasts and agent to customer interactions this can be super useful.

Conclusion

Speech-to-text platforms can benefit from using Open AI’s Whisper Platform in several ways. Whisper provides a state-of-the-art language model that can accurately transcribe speech into text with high precision and low error rates. This means that speech-to-text platforms can leverage Whisper to improve the accuracy of their transcription services. Transcription platforms have been around for years and offer a reliable, straightforward solution to audio transcription. The solutions provided by transcription platforms like izwe.ai will be relevant for a long time especially when having to adapt to the African market and produce accuracy and quality.

--

--