How to setup OpenAI’s Whisper model on Windows 11 for speech recognition
Installation with a basic demo for speech-to-text transcription
--
Installing Whisper on Windows 10/11
- I recommend installing Anaconda and creating a new virtual environment in anaconda prompt to set up Whisper. You can access the anaconda prompt from your start menu search bar after you install Anaconda.
- Then, install the required package using the below pip command in that virtual environment.
pip install -U openai-whisper
3. Whisper needs ffmpeg to run. Installing it on Windows can be a little tricky.
4. You can download (https://github.com/BtbN/FFmpeg-Builds/releases) the latest ffmpeg-master-latest-win64-gpl.zip of compiled ffmpeg and extract it into your system. Add the bin folder inside the extracted folder to the system path. You may follow similar instructions as given in this post. Restart your system.
5. To test that it is installed correctly, you can open any command prompt and type ffmpeg -version
.
6. Also, run the below commands in the anaconda prompt in the virtual environment where you installed whisper. (Remember, to active any virtual env you created, you can use activate env_name
before you run these other commands).
pip install ffmpeg-python
conda install -c conda-forge ffmpeg
Using Whisper to transcribe audio
All it takes is five simple lines of code! You can create a simple program as below and run it. You may use VS Code, Jupyer, or any code editor and set the environment to be used to the one you created above.
import whisper
# whisper has multiple models that you can load as per size and requirements
model = whisper.load_model("small.en")
# path to the audio file you want to transcribe
PATH = "audio.mp3"
result = model.transcribe(PATH)
print(result["text"])
The output will be printed as a text string.
Common issues you might encounter while installing and running whisper
- AttributeError: module ‘ffmpeg’ has no attribute [xyz]
This happens when there is an issue with the ffmpeg installation on your system. pip uninstall ffmpeg or pip uninstall ffmpeg-python
and then reinstall everything from the start and try again. Don’t forget to restart the system once. The issue should be solved.
- FileNotFoundError
This can also happen due to an issue with ffmpeg installation and you can try the same steps as above to fix it.
Another reason it might happen is due to improper formatting of the path to your audio file. You must use a forward slash ( /
) in the path instead of \
. This could also give a Unicode error. When you copy a file path in windows, the path string can contain backward slashes, which must be changed.
Once installed locally, you don’t need internet access to use the Whisper model. Hence, unlimited transcriptions and automation with whatever you want to do!