Simple Guide To “KALDI” — an efficient open source speech recognition tool for Extreme Beginners — by a beginner!

For those who are completely new to speech recognition and exhausted searching the net for open source tools, this is a great place to easily learn the usage of most powerful tool “KALDI” with perfectly defined steps.

Note: This tutorial is for Ubuntu 16.04. It uses tedlium english dataset for ease. Uses docker and gstreamer.

To turn eligible for reading this story, make sure these points fit for you :

  1. Make sure you atleast know the names of available open source tools with their accuracies. For lazy ones like me I state few popular free speech recognition tools below :

a. Kaldi

b. CMU Sphinx

c. Deep Speech

d. HTK

e. Simon

these tools are ordered according to their popularity, efficiency and complexity ;).

2. Be ready to work hard and to witness many many errors! Few errors can be resolved by a simple stackoverflow solution and few make you re-install kaldi all together!

3. before you start with the below mentioned procedure, I highly recommend you to struggle for atleast 2 days with traditional way of using kaldi. If that works well (no pun intended ;)..), you dont need to follow this one! Below I mention links which you are advised to follow step wise for using kaldi.

a. Downloading and installation :

If you feel contented after running the yesno example in the above link, wait for the monster to eat your brain away!

b. Starting with your own dataset :

Once you succeed executing the file, Congrats! you finished reading my story!! If just lands you into errors and errors, proceed further :P .

c. Repeat above tutorial until you are extremely tired and give up. The next alternative you land into is running the available examples in “egs” folder.

Running examples :

Atleast if this works (or you have an LDC subscription) , you are free to leave this story!

d. After exploring the “rm” data as used in the official link, you realize that freely available datasets do some good for you. Below are few links for free datasets.

voxforge :


librispeech :

If you have a good GPU with fast internet connection and 20–25 GB space for a dataset, you can go with the links.

If all steps bring you back, Congrats! you are completely qualified for reading this tutorial.

Kaldi- made easy steps start here :

step 1 : Before you start with kaldi learn the foundation of docker with this simple video tutorial. :

step 2 : installation of docker :

That was a great intro to docker. I advise you to learn docker usage with the official documentation :

As already said lazy ones can skip learning.

step 3 : downloading tedlium dataset :

This file must be stored in media/kaldi_models directory.

to access media, go to computer>media

open terminal there and make directory by running this command.

sudo mkdir kaldi_models

move the downloaded file, after extraction, from downloads to this directory by this command.

sudo mv /downloads/english media/kaldi_models/

this dataset is 1.4GB which is neither too big nor too small!

step 4 : pulling the docker image from dockerHub :

sudo docker pull jcsilva/docker-kaldi-gstreamer-server

run this command to download the image.

step 5 : yaml file download :

from the above link download yaml file and name it nnet2.yaml.

store the file in english directory.

imp :open the file in the text editor and replace test/models with opt/models.

this will be explained later.

In the file, comment out the line

“full-post-processor: ./”

as you wont have the file . This wont effect the functionality of yaml file.

for further information about yaml follow this great video :

step 6: sudo docker container ls

this command gives the list of available containers. if you find a container with image : jcsilva/docker-kaldi-gstreamer-server:latest, your container allocation is successful under the port 8080:80.

step 7 : getting inside the container :

docker run -it -p 8080:80 -v /media/kaldi_models:/opt/models jcsilva/docker-kaldi-gstreamer-server:latest /bin/bash

this gets you inside the container which can almost be used as a normal linux terminal. Docker partitions memory for its container. place the yaml file in opt/models.

step 8 : starting master and worker by docker :

./ -y /opt/models/nnet2.yaml

this will create master.log and worker.log in opt.

step 9 : run ls -l to see the available items in the container.

if the list contains :

gst-kaldi-nnet2-online, kaldi, kaldi-gstreamer-server, master.log, models,,, worker.log

then your ./start was executed properly.

step 10 : run cat worker.doc to find whether worker is working.

you probably should encounter an error showing no path found which can be eliminated by modifying the conf files.

ls models/english/tedlium_nnet_ms_sp_online/conf/

this gives all the files in conf.

vi models/english/tedlium_nnet_ms_sp_online/conf/<file>

replace the <file> with the name of the file in conf folder. modify the test/models path anywhere if found to opt/models. View each file of conf folder by running the same command.

run cat worker.doc again to test the functionality. This time the error must be resolved and worker available message must be displayed.

step 11 : websocket url :

enter the web page. in the location enter : ws://localhost:8080/client/ws/status and press connect. On connecting, if you find the message : RECEIVED: {“num_workers_available”: 1, “num_requests_processed”: 9} in log, then the connection is perfect.

Congratulations!! now you have a working kaldi speech recognizer(english) with gstreamer and docker.

The fun part — showing it off!! :

To show off the working of “your” kaldi, follow the steps.

step 1 : install the free konele app on any of your android devices from the play store. go to settings and choose the mode konele(fast recognition) in recognition services.

step 2 : For the konele app, you need to get the public link for your local kaldi. So to generate a public url, it must be hosted on a server. If no server is available, use free private hosting sites. A good one found is ngrok.

become a ngrok member : sign up in the above link and follow the installation guide.

download and unzip. run the command :

./ngrok authtoken <token value>

replace <token value> with your token. This adds your account to .yml file.

run the below command to know your public hosting link.

./ngrok http 80

step 3 : in the fast recognition mode, modify the WebSocket URL with the ngrok.

for example : if terminal contains :

forwarding ->localhost : 8080

enter : ws:// in the WebSocket URL section of konele.

step 4 : close the settings and start konele. tap and speak. let it transcribe and look at the results!

Finally, this tutorial has a happy ending!!