Offline Speech Recognition in Flutter: No Siri, No Google, and No, It’s Not Speech-To-Text

Published in

Picovoice

6 min readFeb 18, 2021

I don’t have to tell you about the importance of hands-free mobile apps in the 21st century. What I do have to tell you is that it’s easier than you think to add it to your Flutter project. Picovoice recently released a series of Flutter packages that have made adding offline voice commands to your mobile app a walk in the park. No, seriously — you’ll have time to take a walk in the park after doing it.

For this tutorial, I’ve taken some inspiration from last year’s ubiquitous #flutterclock challenge and created a simple clock app. A clock app is a perfect platform for domain-specific voice controls (i.e. it has a defined list of commands and parameters). We don’t need the clock to understand the scope of the English language to set an alarm for tomorrow, and we certainly don’t need to send microphone audio to the big computer in the sky to divine our intent — let’s keep it simple and private. A hands-free mode also happens to be an extremely useful addition to a clock app. You may want to set a timer in the kitchen when your hands are occupied or start a stopwatch during your workout.

Enough explanation — let’s make it!

Step 1 — Make a Simple Clock App

Though I used the Flutter Clock challenge as inspiration, I decided against using the actual project as a starting point because there was a bit too much code that I wasn’t going to use and a fair amount I was going to add. Instead, I started from scratch and came up with something reminiscent of the Android Clock app.

The UI is fairly simple: three different widgets that are toggled between using the bottom navigation bar. As for the code-behind, the backbone is a periodic function that updates our clock and checks on any active alarms and timers.

Step 2— Import Picovoice Flutter SDK

Now that we have a simple clock app, let’s go ahead and see how we add an offline voice AI to it. For this project, we’re going to need the app to understand a set of commands, but we’re also going to want a wake word to tell the app when to listen for a command. The Picovoice Platform Flutter SDK will give us everything we need, so let’s import it by adding it to the project’s pubspec.yaml file.

picovoice_flutter: ${LATEST_VERSION}

Next, we’re going to use the PicovoiceManager class to capture audio and pass it to the Picovoice platform.

There are a couple of things to unpack here — in the constructor we’re providing the PicovoiceManager with an AccessKey, two files and two callbacks. The AccessKey can be obtained by signing up for a free account on the Picovoice Console. Both keywordPath and _wakeWordCallback relate the Porcupine Wake Word engine — the part of the platform that is always listening for a specific trigger phrase. As for contextPath and _inferenceCallback, these relate to the Rhino Speech-to-Intent engine, which is going to attempt to decipher the command that follows the wake word. If we try to launch the app at this point, Picovoice will fail to initialize due to missing keyword and context files — this is our next task. Now let’s head over to the Picovoice Console to create a custom wake word and command context for our app.

Step 3 — Design a Custom Voice Interface

The Picovoice Console gives us all the tools we need to create voice interfaces that fit our needs and work on our desired platforms. In this tutorial, we are going to create a custom command context that works and train it for Android and iOS.

To create a custom context, we’ll click on the Rhino tab, create an empty context and use the built-in editor to design our command interface. Our goal here is to create a simple grammar that our clock app will be able to use to control the app. The context will contain a set of intents and slots. Intents are essentially our command phrases — in our case, these will include items such as setTimer and setAlarm. Slots, on the other hand, you can think of as variables. For example, our context will contain a slot called day which has values “Monday” through “Sunday”, “today” and “tomorrow”. An example of an intent expression in our context would be something like:

set (an) alarm for $day:day at $pv.TwoDigitInteger:hour $amPM:amPm

Words in parathesis are optional and text preceded by a dollar sign are slots that we’ve created.

As for our wake word, we can use any one of the built-in Porcupine models from the Porcupine Github repo or, if you’d like to create custom wake words, you can train them on Picovoice Console.

Now that we have the required files, we can bring them into our Flutter project and start voice-enabling our app.

Step 4— Import Model Files as Flutter Assets

Earlier in this article, we saw that the constructor for PicovoiceManager takes a keywordPath and a contextPath — the keywordPath refers to the file path of the Porcupine model file (.ppn file extension), while the contextPath refers to the file path of the Rhino model file (.rhn file extension). Now that we have these required files, we’ll drop them into the asset folder of our Flutter project and add them to the pubspec.yaml file as assets.

Traditionally, non-media assets have been difficult to load in a cross-platform app, but luckily Flutter has an asset bundle that we can read from. We’ll now add this file logic and some platform detection to our _initPicovoice function from earlier:

If you launch the app now, PicovoiceManager will initialize and start streaming audio. If you attempt to say the wake word, followed by any of the commands from the context, you’ll see them printed to the debug console. While this is all well and good, we need these voice controls to actually control the app now!

Step 5— Integrate the Voice Controls

From the print commands in our _wakeCallback and _inferenceCallback, you can start to work out the code we’re going to add to parse the objects we’re going to receive from the PicovoiceManager. The good news is, _wakeWordCallback is simply called whenever the wake word is detected and the inference is a class with the following structure:

{
    isUnderstood: true,
    intent: 'setTimer',
    slots: {
        hours: 2,
        seconds: 31
    }
}

This is in stark contrast to the speech-to-text approach where we have to parse a completely unknown and unstructured string. After filling out these functions, we have callbacks that look like the following:

As you can see, the wake word callback simply changes the UI state to listening; Picovoice manages the switch between Porcupine and Rhino internally. In our app’s case, this will turn the microphone blue, letting the user know they can issue a command to the app.

The inference requires some additional logic to translate the inference object into a command. First, we check if the inference engine understood the spoken phrase as a command that exists within its context. If it was understood, we need to determine which intent to execute and then pass the slots (i.e. command variables) to the relevant app method. For example, the _setTimer function takes the slots and turns them into a timer duration:

Once we’ve filled in all the control functions we should have a completely hands-free and cross-platform Flutter clock, as demonstrated in the following video:

Hope you enjoyed this tutorial! Feel free to browse and reuse the source code here. Learn more about Picovoice and our technology by visiting our website, Github, or developer documentation.