How to install Whisper on Mac, an amazing OpenAI’s speech-to-text recognition system

Egor Menyaylo
GIMZ
Published in
5 min readFeb 21, 2023
image generated by DALL·E 2

We’ve already told you how to use Whisper in your browser. It’s a neural network-based system that can transcribe audio into text in 99 languages. Now there’s a guide for Mac owners.

An enthusiast has rebuilt Whisper in C/C++ for macOS — Intel and Apple Silicon (M-series) processors are supported, and you can even install it on iOS if you want. It’s essentially an unofficial version of Whisper which has been optimised for Apple processors. But it works just as well.

Compared to Google Colab, the Mac version is about twice as fast. On a MacBook Pro 16 (M1 Pro and 16GB RAM), a fifty minute recording was transcribed:

  • via Google Colab — 53 minutes;
  • via whisper.cpp — 18 minutes.

Install whisper.cpp on Mac

1. Download the code.

Simple option: click on Code → Download ZIP

More complicated via git: copy the link to the code → in the terminal write git clone {link}.

Move the downloaded whisper.cpp folder to a location where it will be securely stored. For example /User/Name/Files.

3. Interaction with Whisper will be via Terminal. Therefore, you must first navigate to the whisper.cpp folder.

To do this in Terminal, type cd (with a space) and drag the whisper.cpp folder (the path to it will be automatically written). Press Enter.

4. Build Whisper.

Run make command. If this doesn’t work, you don’t have the right packages, so you need to install them.

To do this, install brew. Insert and execute this line in Terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Once the process has finished, do not close Terminal — brew will prompt you to run two more commands. Execute one at a time: the first, then the second.

Run make again. This time it should work.

5. Run ./main.

The answer should be something like this:

This is also a help sheet with additional parameters that Whisper supports. But more about them later.

6. Download a model.

The following models are available in whisper.cpp (the larger the model, the better the quality and the longer the decoding time):

tiny.en
tiny
base.en
base
small.en
small
medium.en
medium
large-v1
large

The most interesting one is large. That’s what we download. Run the command:

make large

7. Run Whisper and do a test.

Download an audio sample, insert the command and drag the sample into Terminal. It will look something like this:

./main -m models/ggml-large.bin -f /Users/Name/Files/whisper.cpp-master/samples_jfk.wav

The result is highlighted in red:

Nice! Everything works. You can now do transcripts of your own files.

Transcribe your audio

The main difficulty is that only .wav files with a frequency of 16kHz are currently supported. No big deal.

1. Convert the file.

You can do this any way you want, even directly in Terminal. To do this, install ffmpeg with the command:

brew install ffmpeg

Then use the cd command to navigate to the folder with your file (put it in the whisper.cpp folder for convenience) and run the desired command — you can convert both video and audio sources. For example, let’s take this video and download it in two formats: .mp4 and .mp3.

Command for audio file:

ffmpeg -i polyglot.mp3 -ac 1 -ar 16000 polyglot.wav

Command for video file:

ffmpeg -i /Users/Name/Files/whisper.cpp-master/polyglot.mp4 -ac 1 -ar 16000 polyglotvideo.wav

2. Run Whisper.

Don’t forget to go to the whisper.cpp folder!

Write the most basic command to start Whisper:

./main -m models/ggml-large.bin -f /Users/Name/Files/whisper.cpp-master/affordable.wav

Done. You can copy the text from the terminal and do whatever you like with it:

But it is possible to make the process more convenient. You need to use additional parameters.

Advanced parameters for Whisper

Go to the whisper.cpp folder (via cd) and type the command ./main. This window will open:

Any of these parameters can be used in the run command. We suggest this minimum set to get the text without timecodes and in the separate file:

./main -l en -m models/ggml-large.bin -nt -otxt -f /Users/Name/Files/whisper.cpp-master/affordable.wav

What these parameters mean:

./main — command to start Whisper.
-l — select language of source file. Sometimes Whisper can translate speech immediately into English. To keep the original language, add this option.
-m — select transcription model, which you have previously uploaded.
-nt — remove timecodes from the text, so you don’t have to delete them manually later.
-otxt — export the text to a .txt file after the transcription is completed (it will be in the same folder where the audio file is located).
-f — path to .wav file to be decrypted.

Result in Terminal without timecodes:

And a text file will appear in the folder with source audio file:

Try experimenting with other parameters as well, e.g. another useful one:

-osrt — will make an .srt file with subtitles to the video.
-tr — translates the transcript into the desired language immediately.
-su — speed up audio by half.

Three templates

The parameters you need are already entered here, all you have to do is specify the path to the file in .wav 16KHz format.

  1. Transcribe English speech. No timecodes, export .txt.
./main -l en -m models/ggml-large.bin -nt -otxt -f {file path}

2. Transcribe speech in any language and translate into English. No timecodes, export .txt.

./main -m models/ggml-large.bin -tr -nt -otxt -f {file path}

3. Make subtitles for the English video. Export .srt.

./main -l en -m models/ggml-large.bin -osrt -f {file path}

--

--