Add Voice Recognition to React Native Without Adding the Cloud

Ian Lavery
Picovoice
Published in
4 min readMar 11, 2021

In the mobile space, it seems that cloud-based speech recognition has become the de-facto standard. Even in applications where losing connectivity is expected, developers have often stuck with online speech recognition due to a lack of comfort with the alternatives. In recent years, however, developers have begun to rethink this paradigm. Even Google and Apple have started to move their own products away from cloud-based recognition. The fact is, not every application needs to be online and some — for privacy, security, or reliability concerns — should definitely not be. Let’s see how powerful voice on the edge really is.

In the following tutorial, I’m going to show you how to add offline speech recognition to a simple React Native clock app.

1 — Install Picovoice for React Native

The Picovoice React Native SDK is going to give us the tools we need to add voice recognition on the edge. Simply install the following packages from npm to get started:

npm i @picovoice/react-native-voice-processor
npm i @picovoice/porcupine-react-native
npm i @picovoice/rhino-react-native
npm i @picovoice/picovoice-react-native

2 — Initialize the Speech Recognition Platform

The Picovoice platform encapsulates the Porcupine Wake Word engine and the Rhino Speech-to-Intent. The combination of these two recognition engines allows us to offer a familiar voice experience to our users: a wake word followed by a command that our app will execute — e.g:

Pico Clock, set timer for 5 minutes

In this phrase, Porcupine identifies the wake word “Pico Clock” and Rhino infers the intent of the command that follows. Rhino uses an embedded grammar to determine the meaning of the command without transcribing it to text — it instead returns a RhinoInference object. For our example command, the Rhino inference result would look like this:

{
isUnderstood: true,
intent: 'setTimer',
slots: {
minutes: '5'
}
}

To initialize Picovoice in our app, we’re going to need a Picovoice AccessKey, a Porcupine wake word model, and a Rhino context model. A Picovoice AccessKey can be obtained by signing up for a free account on the Picovoice Console. The wake word model is a file that allows Porcupine to trigger when a certain phrase is spoken, while the context model is the grammar that Rhino uses to determine the intent of a given command.

For our clock app, we’re going to use the trigger phrase Pico Clock and the Clock context, which will allow our app to execute several commands related to the time. Custom wake words and contexts can be trained via the Picovoice Console.

Once we have an AccessKey, a Porcupine model (.ppn file), and a Rhino model (.rhn file) we can initialize a PicovoiceManager in our React Native app.

Using PicovoiceManager allows us to avoid worrying about audio recording — it will record audio for us and pass the frames to Picovoice. You’ll notice we also include a wakeWordCallback and an inferenceCallback in our constructor. These functions will trigger when Porcupine and Rhino deliver results, respectively.

3 — Get Permission to Record Audio

To get permission to use the microphone on iOS, open your Info.plist and add the following line:

<key>NSMicrophoneUsageDescription</key>
<string>[Permission explanation]</string>

On Android, open your AndroidManifest.xml and add the following line:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Finally, in your app code, be sure to check for permission before proceeding with audio capture:

Once .start() has been called, Picovoice is listening for our “PicoClock” keyword and any commands that follow.

4 — Controlling Our App With Voice Commands

Now that we’ve created a voice interface for our app, it’s time to connect the wires and put it to work! For the purposes of this tutorial, I’ve created a simple clock app based on the built-in clock app that every mobile device comes with. It has three main components: a clock that shows local time, a timer, and a stopwatch.

To connect these three components to our voice interface, we’ll need to add some control code to the wakeWordCallback and the inferenceCallback that we passed to the PicovoiceManager.

We connect each intent to a specific action taken in the app (e.g. setting the timer duration or starting the stopwatch) and we pass in the intent’s slots as arguments. Slots can be thought of as the parameters for each command — for instance, when we set the timer duration the slots are parsed like so:

Once we connect all the functions of the clock app with our voice interface, we have a cross-platform clock app that is entirely hands-free. Best of all — no audio data was transmitted off the device, ensuring your user’s privacy.

The full source code from this tutorial can be found here. If you’re interested in mobile dev in general, I also made this app in Flutter. For more information regarding Picovoice’s SDKs and products, visit the website, docs or explore the GitHub repositories. If your project requires custom wake word or context models, sign up for the Picovoice Console.

--

--