Prioritizing Privacy: Add Offline Speech Recognition to a Java Application

Ian Lavery
Picovoice
Published in
3 min readMar 3, 2021

Integrating voice commands into a Java application has been a traditionally daunting task. While JDK provides a Speech API, it is unfortunately only an interface to a collection of outdated products and third-party cloud providers. Let’s leave all this behind and add some modern, offline speech recognition to our Java application. By keeping audio processing offline we can ensure our user’s data is kept on-device, thereby prioritizing their privacy and security.

1 — Integrate Picovoice Java SDK

We’re going to use Picovoice’s Java SDK for this task. You can get the latest version from the Maven Central Repository and reference it like so:

ai.picovoice:picovoice-java:${version}

2 — Initialize the Picovoice Platform

The Picovoice SDK provides a class that encapsulates both the Porcupine Wake Word engine and the Rhino Speech-to-Intent. These two engines in concert allow you to say a wake word followed by a command, e.g:

Jarvis, turn off the lights in the kitchen

In this case, “Jarvis” is the wake word detected by Porcupine, and the command is sent to Rhino to run inference on. Rhino does not transcribe the command like a Speech-to-Text engine would — it instead uses its context grammar to infer the overall intent of the phrase. Once it has a result, it returns a structured inference object that looks like this:

{
isUnderstood: true,
intent: 'changeLightState',
slots: {
state: 'off',
location: 'kitchen'
}
}

To create an instance of the Picovoice class we need to provide a Porcupine wake word model and a Rhino context model. The wake word model is used to recognize a single wake word or phrase, while the context model is generated from a grammar that describes a set of commands and associated parameters.

Several pre-trained Porcupine and Rhino models are available for free, public use on the Picovoice Github repositories[1][2]. For this demo, we’re going to use the keyword Jarvis and the Smart Lighting context, which understands a collection of commands that change the color/state of lights in a smart home.

We also need a Picovoice AccessKey, which is obtained by signing up for a free account on the Picovoice Console. The Picovoice Console can also be used to create custom wake words and contexts.

Now that we have a Porcupine model file (.ppn) and a Rhino (.rhn) model file, we can go ahead and initialize the Picovoice Platform in our Java application:

You’ll notice we’ve also included a wakewordCallback and an inferenceCallback in the Picovoice builder. These two callbacks are triggered when the wake word has been detected and when a user intent has been inferred, respectively.

3 — Read and Process Microphone Audio

For Picovoice to deliver results, it needs to process a stream of audio frames. Thankfully, the Java Sound API provides an easy, cross-platform way to access audio capture devices:

Now we just need to create a loop that reads microphone data and passes it to Picovoice in the format it requires:

4 — Integrating our Voice Interface

Now that we have a functioning voice interface, we need to hook it up to our application. I’ve created a very simple Java GUI using Swing that I’m going to control with voice commands. The app will represent a visual control panel for our smart home, showing the status of the lights in each room.

To hook this GUI up to our voice interface, we’ll revisit the inferenceCallback from earlier and actually process the inference to determine what action to take in the application.

And with that — a mere 200 lines of code later — we have a GUI, audio recording, and offline speech recognition running in a cross-platform Java application.

Feel free to browse, clone, and modify the full source code. For more information regarding Picovoice’s SDKs and products, visit the website, docs or explore the Github repositories. If your project requires custom wake word or context models, check out the Picovoice Console.

--

--