Transcribing an audio file using the Speech Recognition library and Python

2 min readJul 29, 2022

What is transcription?

Transcription is the process of converting speech from an audio or video file into text. This process involves more than just listening to recordings, the content must be understood and nothing left out. Transcribing audio or video files can be done manually or using computer programming. The Python programming language has a library called Speech Recognition that is used for transcription.

TRANSCRIBE AN AUDIO FILE FROM THE OPEN SPEECH REPOSITORY

Step 1: Make a new folder adding a .wav file

Click HERE to find a .wav file of your choice.

Step 2: Generate a virtual environment

Generate a virtual environment using the appropriate method.

python -m venv virtual

Step 3: Access the virtual environment

virtual\Scripts\activate

Step 4: Install the Speech Recognition library to the virtual environment

Step 5: Create a Python file

import speech_recognition as sr# Initialize recognizer class                                       
r = sr.Recognizer()# audio object                                                         
audio = sr.AudioFile("harvard.wav")#read audio object and transcribe
with audio as source:
    audio = r.record(source)                  
    result = r.recognize_google(audio)
    
print(result)

Step 6: Execute the main Python file and obtain the transcribed text