Offline Speech Recognition in Flutter: No Siri, No Google, and No, It’s Not Speech-To-Text
I don’t have to tell you about the importance of hands-free mobile apps in the 21st century. What I do have to tell you is that it’s easier than you think to add it to your Flutter project. Picovoice recently released a series of Flutter packages that have made adding offline voice commands to your mobile app a walk in the park. No, seriously — you’ll have time to take a walk in the park after doing it.
For this tutorial, I’ve taken some inspiration from last year’s ubiquitous #flutterclock challenge and created a simple clock app. A clock app is a perfect platform for domain-specific voice controls (i.e. it has a defined list of commands and parameters). We don’t need the clock to understand the scope of the English language to set an alarm for tomorrow, and we certainly don’t need to send microphone audio to the big computer in the sky to divine our intent — let’s keep it simple and private. A hands-free mode also happens to be an extremely useful addition to a clock app. You may want to set a timer in the kitchen when your hands are occupied or start a stopwatch during your workout.
Enough explanation — let’s make it!
Step 1 — Make a Simple Clock App
Though I used the Flutter Clock challenge as inspiration, I decided against using the actual project as a starting point because there was a bit too much code that I wasn’t going to use and a fair amount I was going to add. Instead, I started from scratch and came up with something reminiscent of the Android Clock app.
The UI is fairly simple: three different widgets that are toggled between using the bottom navigation bar. As for the code-behind, the backbone is a periodic function that updates our clock and checks on any active alarms and timers.
Step 2— Import Picovoice Flutter SDK
Now that we have a simple clock app, let’s go ahead and see how we add an offline voice AI to it. For this project, we’re going to need the app to understand a set of commands, but we’re also going to want a wake word to tell the app when to listen for a command. The Picovoice Platform Flutter SDK will give us everything we need, so let’s import it by adding it to the project’s pubspec.yaml
file.
picovoice_flutter: ${LATEST_VERSION}
Next, we’re going to use the PicovoiceManager
class to capture audio and pass it to the Picovoice platform.
There are a couple of things to unpack here — in the constructor we’re providing the PicovoiceManager
with an AccessKey
, two files and two callbacks. The AccessKey
can be obtained by signing up for a free account on the Picovoice Console. Both keywordPath
and _wakeWordCallback
relate the Porcupine Wake Word engine — the part of the platform that is always listening for a specific trigger phrase. As for contextPath
and _inferenceCallback
, these relate to the Rhino Speech-to-Intent engine, which is going to attempt to decipher the command that follows the wake word. If we try to launch the app at this point, Picovoice will fail to initialize due to missing keyword and context files — this is our next task. Now let’s head over to the Picovoice Console to create a custom wake word and command context for our app.
Step 3 — Design a Custom Voice Interface
The Picovoice Console gives us all the tools we need to create voice interfaces that fit our needs and work on our desired platforms. In this tutorial, we are going to create a custom command context that works and train it for Android and iOS.
To create a custom context, we’ll click on the Rhino tab, create an empty context and use the built-in editor to design our command interface. Our goal here is to create a simple grammar that our clock app will be able to use to control the app. The context will contain a set of intents and slots. Intents are essentially our command phrases — in our case, these will include items such as setTimer
and setAlarm
. Slots, on the other hand, you can think of as variables. For example, our context will contain a slot called day
which has values “Monday” through “Sunday”, “today” and “tomorrow”. An example of an intent expression in our context would be something like:
set (an) alarm for $day:day at $pv.TwoDigitInteger:hour $amPM:amPm
Words in parathesis are optional and text preceded by a dollar sign are slots that we’ve created.
As for our wake word, we can use any one of the built-in Porcupine models from the Porcupine Github repo or, if you’d like to create custom wake words, you can train them on Picovoice Console.
Now that we have the required files, we can bring them into our Flutter project and start voice-enabling our app.
Step 4— Import Model Files as Flutter Assets
Earlier in this article, we saw that the constructor for PicovoiceManager
takes a keywordPath
and a contextPath
— the keywordPath
refers to the file path of the Porcupine model file (.ppn file extension), while the contextPath
refers to the file path of the Rhino model file (.rhn file extension). Now that we have these required files, we’ll drop them into the asset folder of our Flutter project and add them to the pubspec.yaml
file as assets.
Traditionally, non-media assets have been difficult to load in a cross-platform app, but luckily Flutter has an asset bundle that we can read from. We’ll now add this file logic and some platform detection to our _initPicovoice
function from earlier:
If you launch the app now, PicovoiceManager
will initialize and start streaming audio. If you attempt to say the wake word, followed by any of the commands from the context, you’ll see them printed to the debug console. While this is all well and good, we need these voice controls to actually control the app now!
Step 5— Integrate the Voice Controls
From the print commands in our _wakeCallback
and _inferenceCallback
, you can start to work out the code we’re going to add to parse the objects we’re going to receive from the PicovoiceManager
. The good news is, _wakeWordCallback
is simply called whenever the wake word is detected and the inference
is a class with the following structure:
{
isUnderstood: true,
intent: 'setTimer',
slots: {
hours: 2,
seconds: 31
}
}
This is in stark contrast to the speech-to-text approach where we have to parse a completely unknown and unstructured string. After filling out these functions, we have callbacks that look like the following:
As you can see, the wake word callback simply changes the UI state to listening; Picovoice manages the switch between Porcupine and Rhino internally. In our app’s case, this will turn the microphone blue, letting the user know they can issue a command to the app.
The inference requires some additional logic to translate the inference object into a command. First, we check if the inference engine understood the spoken phrase as a command that exists within its context. If it was understood, we need to determine which intent to execute and then pass the slots (i.e. command variables) to the relevant app method. For example, the _setTimer
function takes the slots and turns them into a timer duration:
Once we’ve filled in all the control functions we should have a completely hands-free and cross-platform Flutter clock, as demonstrated in the following video:
Hope you enjoyed this tutorial! Feel free to browse and reuse the source code here. Learn more about Picovoice and our technology by visiting our website, Github, or developer documentation.