Pocketsphinx in ROS: Demo 1.0

Pankaj Baranwal
4 min readJun 17, 2017

--

Demonstration of kws mode using turtlebot simulation

Embedded above is the video demonstration of kws mode (keyword spotting mode), a feature recently made available in the latest ROS package for pocketsphinx. This blog post should be considered as an add-on for those who want to get an in-depth understanding of what exactly is going on in this video.

Pocketsphinx supports keyword spotting mode as one of its many features. The advantage of this mode is that you can specify a threshold for each keyphrase and then the keyphrase can be detected in continuous speech. Unlike grammar and n-gram mode, KWS attempts to filter non-target words. When working with grammars (or n-grams), pocketsphinx will always decode some (wrong) hypothesis even if you said completely irrelevant stuff. This is quite challenging problem of state-of-the art speech recognition.The video starts with the mention of “TurtleBot simulation in gazebo”. Now, Gazebo is an essential tool in ROS(Robot Operating System) for robot simulation. It offers the ability to accurately and efficiently simulate populations of robots in complex indoor and outdoor environments. You can head to this link to know more about how to install and start using gazebo in ROS! Also, TurtleBot is a cute little robot that can drive around your house, see in 3D, and have enough horsepower for you to create exciting applications over it!

The video then moves to the actual setup starting with the ROS server which can be launched using the command line tool roscore. Please note that you need to have ROS installed on your system for this to actually work. This ROS server is used to handle the communication between different nodes from different packages in ROS.

Next, in a new terminal window, the following command is executed:

roslaunch turtlebot_gazebo turtlebot_world.launch

turtlebot_world.launch is a launch file in the turtlebot_gazebo package which launches the required simulation in gazebo. The video quickly shifts to the next terminal window where a new command is entered:

rostopic echo /kws_data

rostopic command can be used to display debug information about ROS topics. You should go to the official docs for a better understanding of this command-line tool. We use it above to “echo” or print the messages being sent over the topic /kws_data. This topic is published by a node in the pocketsphinx package which will be launched next:

roslaunch pocketsphinx kws.launch dict:=voice_cmd.dic kws:=voice_cmd.kwlist

*Note*
Since the publishing of this post, the codebase has been updated to accommodate some new features. Please read the wiki here to correct command. The one mentioned above will not work.

kws.launch is a launch file in the pocketsphinx package which is provided with two command-line arguments: dict and kws. dict contains the file name of the dictionary. All these input files need to be present in the demo folder within the pocketsphinx package. kws contains the file name of the keyword list which includes a list of keyphrases and their threshold values. To modify these files, you can check this tutorial on language models out.

This starts all the necessary nodes for running kws mode in ROS. A bunch of debug messages are printed. Once everything is good to go, you can simply speak out some of the words present in the dictionary and chances are, the previous terminal window will recognize the keyword and print it on screen. In case the engine is unable to recognize your voice commands, ensure that the threshold values have been properly set and that your accent is similar to the acoustic model used. In the video, the accent was Indian while the model used was for US English. So, even though the system worked well, the initial “left” commands were missed by the system.

Now that kws mode is set up and tested, we see one last node being launch using the command:

rosrun pocketsphinx voice_control_example.py

This command subscribes to /kws_data topic and uses the values published on it to control the turtlebot simulation on gazebo. This is done in the following manner:

The turtlebot simulation subscribes to a topic: /mobile_base/commands/velocity. Our node publishes the required velocity and direction instructions on this topic which are read by gazebo and the turtlebot moves accordingly. If you want to work on some other simulation platform and don’t know which topic to publish the instructions to, you can use another beautiful command line tool provided by ROS: rqt_graph. It provides a GUI visualization of all the computation graphs in ROS similar to the one shown in the video:

Part of the output visualization produced by rqt_graph

Now, you can simply use your voice commands to move the turtlebot around! Congratulations if you were able to follow through! In the next video demonstration, we will try to include more human-friendly interactions. Nobody in their right mind is going to instruct a bot to move “forward” or “back” every time they need something done! If it could respond to commands like “Come here”, “Go to the kitchen” or “Bring me coffee”, that would be cool! And that is exactly what we are hoping to achieve here. So, stay tuned! And Happy coding!

--

--