GSoC: A journey just started

Pankaj Baranwal
3 min readAug 24, 2017

--

From past three months, I have been working with CMUSphinx to provide a ROS package for pocketsphinx as part of my Google Summer of Code. These three months have been some of the most productive times of my life as a student developer. You can find the code I have written here:

I have also made a brief demonstration video of my work which can found here:

So, I have majorly worked on three features:

The theory for each of these has already been covered in my previous articles and I have attached their links above. If you liked the video and would like to use it for your own PC, this blog will provide you with detailed instructions as to how you could modify the existing codebase.

  • Step 1: Clone the repository in the catkin workspace of ROS.
    If you do not know much about ROS or catkin workspace, you can go through the following article:
    http://wiki.ros.org/ROS/Tutorials/InstallingandConfiguringROSEnvironment
  • Step 2: Go to the demo/ folder. Open asr_spk.gram file in a text editor.
  • Step 3: Add the commands you need the engine to recognize by modifying the following line:
 public <rule> = GO TO MY WORKSPACE |…
  • Step 4: Modify the keywords_spk_verification.dic to include all the words used in the rule of grammar and their phonetic counterparts.
  • Step 5: If you need to change the keyphrases the system recognizes in the start, you will also need to change keywords_spk_verification.kwlist and add all such keyphrases along with their thresholds.
  • Step 6: Now that the basic files are ready, you need to add an acoustic model which is trained on your target speaker. You can do so by preparing the acoustic model as described in my previous blog post. In case for some reason, you would like to disable this option, you can simply add the argument the following line to your terminal command:
sp_verif:=false
  • Step 7: All that is left to do is map the phrases you added to the grammar to tasks on your computer! The target file can be found in examples/execute_commands.py.
    If you go to handleOutput() function within this file, you will see a lot of if…elif conditions. Just copy-paste the code from one of these and change the terminal commands written within:
os.system("<your_command>")

That’s it! You are good to go!

Now simply run the following two commands in two separate terminal windows and see the magic enravel!

roslaunch pocketsphinx kws_speaker_specific.launch spdict:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/speaker_test.dic spkws:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/speaker_test.kwlist sphmm:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/speaker_verification/an4.ci_cont_adapt/ dict:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/keywords_spk_verification.dic kws:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/keywords_spk_verification.kwlist gram:=/home/pankaj/catkin_ws/src/pocketsphinx/demo/asr_spk grammar:=asr rule:=rule

and

rosrun pocketsphinx execute_commands.py

I know the commands look scary! But if you look carefully, you will recognize that majority of it just contains the absolute path of target files! The first one runs a set of nodes via a launch file while the second one maps the final outputs from the first with the terminal commands.

So, this marks the end of the official coding period for Google Summer of Code. I am hoping to continue contributing at CMUSphinx as there is so much more that needs to be done. It feels like I have just scratched the surface of what can be accomplished using this technology and I am more excited than ever to dive deeper and try to actually improve on their technology. As always, suggestions and reviews are welcome! :)

Happy coding!

--

--