Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Follow publication

Member-only story

Transcribe audio files with OpenAI’s Whisper

7 min readSep 26, 2022

--

Photo by Will Francis on Unsplash.

OpenAI recently open-sourced a neural network called Whisper. It allows you to transcribe (large) audio files like mp3 offline. OpenAI claims Whisper approaches human-level robustness and accuracy in English speech recognition.

Since there are already existing (open-source) models or packages like Vosk or NVIDIA NeMo out there, I was wondering how well Whisper can transcribe audio files.

This article shows you how to make use of Whisper and compares its performance with Vosk, another offline open-source speech recognition toolkit.

tl;dr

  • Whisper is an open-source, multilingual, general-purpose speech recognition model by OpenAI.
  • It needs only three lines of code to transcribe an (mp3) audio file.
  • A quick comparison with Vosk (another open-source toolkit) has shown that Whisper transcribes the audio of a podcast excerpt slightly better. The main difference is that Whisper offers punctuation. This makes the transcription easier to understand.
  • Scroll down to “Whisper” or click here (Gist) if you are interested in the code only.

Prerequisites

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.