‘He’ just recognized what I said!

Well, the title says it all. The computer just recognized what I said in my Mother Tongue! A major step in the right the direction.

For this to happen, I had to complete the Acoustic Model training. So then!

What is Acoustic Model!

Well it is a set of statistical representational parameters used to learn the language by representing the relation between audio signal and corresponding linguistic features that make up that speech or audio ( phoneme, and transcription! ).

To produce these we need to set up a database structure as documented by the CMU SphinxTrain team. Some of these files were common to the Language Model preparation like the phoneme transcription file. After setting up the database it should look like this irrespective of the language!

The training is straight forward if you get the database error free which was not my case! Thank you! ( ** if you get it error free on the first run, you are probably doing it wrong! ** )

I had to solve two issues ( 1 and 2 ) before I could run the training without any hiccups! It took a day to make the patch works in the files. The documentation didn’t mention that the phone set should contain a maximum of 255 phones due to practical limitation though theoretically it had no problems ( found out myself from the CMU help forums. ). That was the Issue : Reduce phoneset to a max of 255 #31. I successfully reduced to what is found in the current repository version.

Update — July 27

Acoustic Model is ready for testing!

$ sphinxtrain -t ml setup

This command will setup the ‘etc’ and ‘wav’ folder as mentioned above. Now we need to setup the sphinx_train.cfg config file which is excellently documented by the team.

Once that is out of the way, run the training.

$ cd ml


$ sphinxtrain run





still waiting!!



Finally its done! Took quite a lot of time!

Not only that, my Zenbook finally started showing heating and fan noise. That sixth gen Intel needed some extra air! ( ** nice! ** ).

Update — July 29

Well, this means, the GSoC 2016 aim have been achieved which was to develop the Language Model and Acoustic Model. Only thing left is to keep testing it.

The discussion with Deepa mam helped in bringing out a possibility in improving the accuracy which am working on as a branch in parallel to the testing.

With that in mind for the coming week, that’s it for this week

puts "until then ciao!"