Offline Foreign Speech Recognition

How to set up Python libraries for free and offline foreign (non-English) speech recognition

Dmytro Nikolaiev (Dimid)
6 min readOct 1, 2021
Foreign Speech Recognition. Image by Author

In this tutorial, I will show you how to set up Python speech recognition libraries (vosk, SpeechRecognition and Pocketsphinx) to work offline with foreign (non-English) languages.

Attention! Read the next carefully to know what library you need to set up:

  • If online speech recognition is enough for you (you have access to the Internet), use the SpeechRecognition library with Google API. Go to Online Speech Recognition with SpeechRecognition and Google API section.
  • If you need offline English speech recognition, you can install the Vosk library OR Pocketsphinx. And finally, if you want to recognize foreign (non-English) language offline, you can use Vosk or Pocketsphinx with the foreign model. Go to Offline Speech Recognition with Vosk OR Offline Speech Recognition with SpeechRecognition and Pocketsphinx sections.
How to select a speech recognition library? Image by Author

My subjective experiments have shown that the quality is arranged as follows: Google API is the best, vosk is a little worse (but quite a bit) and pocketsphinx showed itself even worse. Also, pocketsphinx works more slowly than the others and is harder to install. But I repeat — this is my personal opinion.

So as not to waste your time, in this tutorial, I will describe only the installation process with minor examples. At the end of each section, I will provide you links where you can learn more about a particular library and script example.

Online Speech Recognition with SpeechRecognition and Google API

This is the easiest way. If you have access to the Internet, you can simply use Google API. This also will allow you to work with different languages by just setting the language parameter.

All you need to do is install SpeechRecognition library with pip install SpeechRecognition. Then you can recognize audio with the next code:

Pros:

  • Easy installation
  • Easy to use
  • Good quality

Cons:

  • Works online so requires an internet connection

More information:

Offline Speech Recognition with Vosk

Vosk is an offline speech recognition tool and it’s easy to set up. First, you need to install vosk with pip command — pip install vosk. If you have trouble installing, upgrade your pip or Python (see the Installation section on vosk site).

Then you have to download the model by simply clicking on it. If your model is not downloading, copy the link and open it in a new window. Also, you can try to download it using another browser.

Download the vosk model. Screenshot of a public web page

The recognition language will depend on the model you download. Then you need to unpack the model in some folder and that’s all — you can use it! Vosk models output results in JSON format — this can be confusing for beginners, but allows you to do speech recognition with timestamps. See examples on GitHub for more code comments.

Pros:

  • Easy installation
  • Good quality
  • Allows to do speech recognition with timestamps

More information:

Offline Speech Recognition with SpeechRecognition and Pocketsphinx

This is the most difficult way. At least here I got the largest number of errors. However, maybe vosk doesn’t support your language, or you have your reasons.

To use offline recognize_sphinx() method in SpeechRecognition library you have to install Pocketsphinx. Official pocketsphinx documentation tells to run two commands:

  • python -m pip install --upgrade pip setuptools wheel
  • pip install --upgrade pocketsphinx

I got an error on the second one, so first I had to install a swing for Windows, and here’s how I did this (note that if you use a virtual environment python path will differ).

And then update Microsoft C++ Build Tools, and here’s how I did this.

After that, you have to be able to use the offline recognize_sphinx() method, but only with the English language. So now you have to download and set up a foreign pocketsphinx model.

You can download foreign models for pocketsphinx here. There are 15 languages available now.

For some languages, there are several variants of the models, for others — only one. Click on the selected model and after a few seconds, it will start downloading. Then unzip it — with the tar and gz format, free 7zip archiver can help you. If a model is downloaded in the model.tar.gz format, unzip it twice - first from model.tar.gz to model.tar and then from model.tar to model.

As a result, you should get the folder with the following files:

  • .lm file,
  • .dic file,
  • and other files with and without extensions.

Then go to the folder where pocketsphinx models are located. In my case (I created virtual environment ‘venv’ with Anaconda) it is C:\Users\USERNAME\anaconda3\envs\venv\Lib\site-packages\speech_recognition\pocketsphinx-data\.

There you have to see one folder — en-US. Create a folder with the name of the language - ru-RU for Russian, it-IT for Italian, etc. See other language codes here.

Into your folder, copy and rename .lm file to language-model.lm.bin and .dic file to pronounciation-dictionary.dict.

Then create the acoustic-model folder and copy all other files there (feat.params, mdef, means, mixture_weights, noisedict, sendump, transition_matrices, variances).

The final folder structure for the Russian language is:

├───pocketsphinx-data
│ ├───en-US
│ │ ...
│ │ └───acoustic-model
│ └───ru-RU
│ ├───language-model.lm.bin
│ ├───pronounciation-dictionary.dict
│ └───acoustic-model
│ ├───feat.params
│ ├───mdef
│ ├───means
│ ├───mixture_weights
│ ├───noisedict
│ ├───sendump
│ ├───transition_matrices
│ └───variances

If you did everything right, now you can use the recognize_sphinx() method by just setting the language parameter:

Pros:

  • Easy to use

Cons:

  • Hard to install
  • Bad quality

More information:

Practical Use

You can find all the code on this GitHub repo.

Among other things, there are four code files:

  • speech_recognition_python.ipynb - overview Jupyter notebook with examples of all methods
  • script_online_sr.py - script to recognize English text from .wav file with Google API
  • script_vosk.py - script to recognize English text from .wav file with vosk
  • script_offline_sr.py - script to recognize English text from .wav file with SpeechRecognition and Pocketsphinx

Any of these three scripts you can use like any other Python script. Each of them has two parameters:

  • first (required) — name of the audio file to recognize (audio.wav)
  • second (optional) — name of the text file to write recognized text (audio_outout.txt). If not specified, uses first_parameter.txt (audio.txt)

For example:

python foreign_speech_recognition.py audio.wav audio_outout.txt command will recognize audio.wav file from the current folder and write the recognized text into audio_outout.txt file.

Conclusions

At the end of the Jupyter notebook, you can listen to audio, where I read a fragment of the text from a speech recognition article on Wikipedia. In the table below you can see the recognition results of this audio file. It is worth saying that the quality strongly depends on the pronunciation — it can be very good if you are a native speaker and it can be awful because of your accent (maybe that’s my situation).

Comparison of three speech recognition methods. Image by Author

I don’t want to say anything bad about pocketsphinx library — I’m sure that its authors have done a great job and I was very pleased to work with her. But the advice I can give you — use SpeechRecognition with Google API or vosk.

It was the number of difficulties that prompted me to write this article. You may not encounter these problems or encounter others.

Thank you for reading!

  • I hope these materials were useful to you. Follow me on Medium to get more articles like this.
  • If you have any questions or comments, I will be glad to get any feedback. Ask me in the comments, or connect via LinkedIn or Twitter.
  • To support me as a writer and to get access to thousands of other Medium articles, get Medium membership using my referral link (no extra charge for you).

--

--

Dmytro Nikolaiev (Dimid)

Machine Learning Enthusiast. Blogging about Data Science / Machine Learning