Voice Recognition offline on Dragonboard with PocketSphinx

Spencer Fricke
6 min readJan 3, 2018

--

Overview of how to setup and run PocketSphinx for offline voice recognition on your Qualcomm Dragonboard 410c

tl:dr — Install commands and code for example

Disclaimer: You don’t need a 3.5mm connection for the microphone. You can use any analog mic, but this article is designed for quick, off the shelf, prototyping.

Why Offline?

While Google Speech API, IBM Watson, or AWS Transcribe are all popular speech recognition platforms there can be various reasons why you might not want to utilize the cloud for this task.

  • Privacy concerns.
  • Poor/non-reliable/no internet connection.
  • No cost after certain number of uses.
  • Don’t want to spend time packaging your data into JSON and constructing a HTTP request ( more for programs using C ).

What you will need

The Dragonboard 410c, power adapter, and SD card for booting Linux is standard.

This part is the personal decision, but I have found great success with my earbuds that came with my phone.

For recording via 3.5mm jack your two options are wiring up your own 3.5mm barrel jack connector or buying the audio mezzanine board (I personally chose the Mezzanine board as it also offered level shifters to 3.3V and 5.0V for GPIO pins and serial UART via USB which is super useful for the low price).

The Dragonboard and Linux

I am gonna assume you have already setup Linaro’s Debian Linux on your Dragonboard as there are already great documentation how to set up your Dragonboard with Linux.

Update and add Linux packages

First you will need to make sure you update your packages

~$ sudo apt update
~$ sudo apt upgrade
~$ sudo apt install autoconf libtool automake bison python-dev swig libasound2-dev

The main package to notice we installed is libasound2-dev as this is the ALSA library headers we need.

Linux Audio Subsystems

“The installation process is not an issue if you understand the complexity of audio subsystems in Linux.” ~ PocketSphinx FAQ

If you are not familiar with how audio works in Linux, this is where you need to just take a deep breath and bear with me for this sections as it will be useful in later steps.

ALSA vs PulseAudio

For sake of time here is what you need to know to get started as anything more I encourage you to do your own research.

  • ALSA - Low level drivers for fine control of audio options.
    Seen as amixer, aplay, arecord, libasound, or <alsa/asoundlib.h>
  • PulseAudio - Wrapper that uses ALSA but takes care of connecting multiple sources of audio I/O
    Seen as pacmd, pactl, pulseaudio, libpulse, or <pulse/simple.h>

There is nothing wrong with using PulseAudio, but for this tutorial we are using ALSA since the Dragonboard 410c has few audio I/O by default.

Understanding ALSA settings

You will see in future commands things such as plughw:0,2 or -c 0 and I want to quick explain where these settings come from. If you run the command cat /proc/asound/cards you will see this:

This is showing that the Dragonboard only has one audio card, therefore, all settings directed towards it using commands like amixer will reference -c 0. Another command to check is cat /proc/asound/pcm and you will see this:

  • plughw:0,1 refers to the output speakers on your board.
  • plughw:0,2 refers to the input microphone we will be using.

Installing PocketSphinx

PocketSphinx is a great library designed for embedded devices such as the Dragonboard. First thing is somewhere on your device is to clone the two repos.

~$ git clone https://github.com/cmusphinx/sphinxbase.git
~$ git clone https://github.com/cmusphinx/pocketsphinx.git

First we will setup the Sphinxbase.

~$ cd sphinxbase
~$ ./autogen.sh

The big thing here to check for is that you configured it to use ALSA which can be seen in the log.

The big thing to note here is that Sphinx looks first for PulseAudio library header ( Note: Having PulseAudio installed and having the development libraries are two different things ). If it detected PulseAudio we are going to want to reconfigure it.

How to remove PulseAudio headers ( if needed )

There are two options for this, the first is to simply remove them with sudo apt purge libpulse-dev or the other option is to remove the AC_CHECK_HEADER(pulse/pulseaudio.h section from the configure.ac file and running ./autogen.sh again.

Finish install Sphinxbase

To finish Sphinxbase we just need to make and install it.

~$ make
~$ sudo make install

Install PocketSphinx

This is the easy part.

~$ cd ../pocketsphinx
~$ ./autogen.sh
~$ make
~$ sudo make install

Add PocketSphinx path

You can either add these two commands to your .profile or .bashrc or just call it each session.

~$ export LD_LIBRARY_PATH=/usr/local/lib
~$ export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

NOTE: If you see error while loading shared libraries: libpocketsphinx.so.3 then you forgot this step.

Configure your microphone

This is the codec inside the Dragonboard 410c and there are only 3 commands you need to run to configure it.

~$ amixer -c 0 cset iface=MIXER,name='DEC1 MUX' 'ADC2'
~$ amixer -c 0 cset iface=MIXER,name='ADC2 MUX' 'INP2'
~$ amixer -c 0 cset iface=MIXER,name='ADC2 Volume' 8
  • ‘DEC1 MUX’ ‘ADC2' tells the board to use ADC2 (ADC = Analog-to-digital converter).
  • ‘ADC2 MUX’ ‘INP2' tells the board to use MIC3_IN which is what the 3.5mm jack is hooked up too.
  • ‘ADC2 Volume’ 8 sets the volume which can goes from 0 to 8 with 0 being 0dB and each value incrementing the gain by 6dB.

To test your microphone you can use arecord -D plughw:0,2 -d 5 -r 48000 -f S16_LE test.wav to record your voice into a .wav file for 5 seconds.

You can also test

Create a PocketSphinx Dictionary Set

You can now test PocketSphinx with there demo using pocketsphinx_continuous -adcdev plughw:0,2 -inmic yes and you will find that is awful and has a very low accuracy. As I am not sure why this is the case by default the way to get super high accuracy is by creating your own dictionary of words that can be recognized. PocketSphinx uses two files for its recoginization, a .dic dictionary file and .lm QuickLM Language Model file. I recommend creating a blank .txt file and each line write a single sentence or word you want added to your word bank. From here take that file and use the Sphinx LM Online Tool to convert it to a .dic and .lm file for you.

Running PocketSphinx

There are various ways to run PocketSphinx and I have various demos to show how you can incorporate this in your C/C++ program.

Quick Demo

If you have set up all the settings above you can run this demo with no prep.

~$ wget https://raw.githubusercontent.com/sjfricke/Dragonboard-Voice-Recognition/master/Linux-ALSA-PocketSphinx/demo.dic~$ wget https://raw.githubusercontent.com/sjfricke/Dragonboard-Voice-Recognition/master/Linux-ALSA-PocketSphinx/demo.lm~$ pocketsphinx_continuous -adcdev plughw:0,2 -inmic yes -dict demo.dic -lm demo.lm

Once you run it you can say any of these words I added in the demo list and witness your new voice recognition enabled Dragonboard:

what, time, is, it, my, name, can, you, help, me, day, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, color, red, green, blue, yellow , number, zero, one, two, three, four, five, six, seven, eight, nine

--

--