Member-only story
Transcribe audio files with OpenAI’s Whisper
Transcription of audio files with OpenAI’s Whisper
OpenAI recently open-sourced a neural network called Whisper. It allows you to transcribe (large) audio files like mp3 offline. OpenAI claims Whisper approaches human-level robustness and accuracy in English speech recognition.
Since there are already existing (open-source) models or packages like Vosk or NVIDIA NeMo out there, I was wondering how well Whisper can transcribe audio files.
This article shows you how to make use of Whisper and compares its performance with Vosk, another offline open-source speech recognition toolkit.
tl;dr
- Whisper is an open-source, multilingual, general-purpose speech recognition model by OpenAI.
- It needs only three lines of code to transcribe an (mp3) audio file.
- A quick comparison with Vosk (another open-source toolkit) has shown that Whisper transcribes the audio of a podcast excerpt slightly better. The main difference is that Whisper offers punctuation. This makes the transcription easier to understand.
- Scroll down to “Whisper” or click here (Gist) if you are interested in the code only.