Heroes of Deep Learning: Coursera

Transcribing “Heroes of Deep Learning” with OpenAI’s Whisper

Whisper by OpenAI is a state-of-the-art speech to text model that employs deep learning.

Ahmad Anis
Published in
3 min readJun 1, 2023

--

Transcribing the Heroes of Deep Learning series was an idea I had first when I was doing the Deep Learning Specialization by Andrew NG. Out of the whole specialization, this was the video I waited for most in the whole specialization. I got to know many Deep Learning heroes on a deeper level such as Andrej Karpathy, Yann LeCunn, and Geoffry Hinton. I always wanted this series in a textual form or in an audio-only format. This project aims to do both.

For transcribing it to text, I am using OpenAI’s Whisper. Recently, OpenAI’s whisper stormed the world of Speech To Text Models. It is trained on absolutely huge data (680,000 hours of Audio) and is a general-purpose speech recognition model, that can work in many languages. The results are auspicious and it is absolutely a game-changer in this field.

Step 1: Scraping the Interview Videos from Youtube

DeepLearning.ai has all the videos of their courses on Youtube. This playlist on their youtube channel has all the videos in this series. We are going to use pytube to download the videos in audio format.

$ pip install pytube

We are going to use tqdm to track the loop details.

$ pip install tqdm

First, let’s create a mapping where we define the video link and the interviewee’s name. The format is {link1: name1, link2: name2}

We can download all these videos simply by looping over them.

for link, name in tqdm(interviews_details.items()): 
yt = pytube.YouTube(link)
yt.streams.filter(abr="160kbps", progressive= False).first().download(filename=f"interviews/{name}.mp3")

Before doing that, make sure you have a directory named interviews. Now you are going to have all the interviews with the interviewee's name as the filename.

Step 2: Transcribing Videos

To install Whisper, you can simply do

$ pip install git+https://github.com/openai/whisper.git

Now we simply have to load the audio file and transcribe it.

import whisper
import os
all_interviews_results = []
model = whisper.load_model("base") # base model, good for en lang
for downloaded_audio in tqdm(os.listdir("interviews")):
if downloaded_audio.endswith(".mp3"):
result= model.transcribe(f"interviews/{downloaded_audio}")
all_interviews_results.append(result)

We have the results of all interviews appended in a list. If we analyze any single result, it is a dictionary, with 3 keys. Those are

all_interviews_results[0].keys()
>>> dict_keys([text', 'segments', 'language])

The text key has all the text combined. The segments key has been divided into text that is separated by some frames. So it tells you that for a specific time, what were the sentences that were spoken? The language key detects the language.

Step 3: Transcribed Audios to PDF

We have appended all the interviews into a single list, this means that the length of the list is 9 since we have 9 interviews. Each interview has a segment portion, which has all the text separated via timestamps. We are going to use this separated via timestamps as a new sentence. This will give our PDF structure. If you do think about it for a while, this simplest(can be improved) logic is to use a nested loop.

We are going to use fpdf to create pdfs.

That’s it. You are going to have separate pdfs for all of the Interviews, and you’ll have the interviews in Audio Format, which you can listen to while driving or so.

You can find the complete organized code in my GitHub (Don’t forget to star it). You can run it on Google Colab simply by 2 lines of code. (It takes approx 10 mins to transcribe all interviews).

Run the complete code on Google Colab

Learning Outcomes:

  • You have learned how to download any video from Youtube into an audio file.
  • You have learned how to use OpenAI’s Whisper to transcribe any audio
  • You learned how to write those audio files into a PDF File.

Ahmad Mustafa Anis is a Machine Learning Engineer at Red Buffer. You can reach out to Ahmad on Twitter and LinkedIn.

--

--

Ahmad Anis

Deep Learning at Roll.ai, Researcher at Data Providence Initiative, Community Lead at Cohere for AI