From GUI to VUI: Voice-Enabling a Cross-Platform .NET Desktop Application
In a world where developers are racing to deliver MVPs and users are expecting seamless UX across their ever-expanding arsenal of devices, few things are as important to modern app developers as a good cross-platform framework. Up to this point, there has not been a strong solution for adding an offline voice-user interface (VUI) to your cross-platform application. Picovoice has recently removed this limitation for developers by releasing a .NET SDK that will allow you to add voice support across desktop platforms.
For .NET developers, the road to cross-platform support has been long and winding. The introduction of .NET Core delivered some great cross-platform possibilities, but it lacked a cross-platform GUI framework. Luckily, the community responded and created its own. AvaloniaUI is an amazing open-source project for .NET Core that already allows pixel-perfect rendering across all desktop platforms, with mobile platform support and WebAssembly browser support in active development.
As enormously complex as a cross-platform GUI may be under-the-hood, the result is still a collection of controls that require a user’s touch or keyboard input to interact with. The influence of the COVID-19 pandemic is undeniable and has brought with it a renewed desire for touchless interfaces in the market. In addition to this new interest in touchless technology, a touch-only interface narrows your app’s user base from an accessibility point of view and ignores the often desired hands-free use case. Voice-enabling a GUI is a great way to bring a whole new dimension to your interface.
In this article, I’m going to build a basic GUI with AvaloniaUI and then voice-enable it — without changing any of the UI code — with Picovoice’s Porcupine wake word engine. The UI is going to consist of some radio buttons that we’re going to toggle to change the color of the background. Let’s jump in!
The easiest way to get started with Avalonia is to download their Visual Studio Extension which will provide a designer and some project templates. Then, open Visual Studio and create a new project using the Avalonia MVVM Application template.
The template will provide you with some basic app code and MVVM glue. For this tutorial, we’re only going to be adding code to
MainWindow.axaml, we’re going to build our UI. We’ll add four radio buttons in a centered stack panel (…I said it was going to be basic!). We’ll also add some bindings to the IsChecked property of the radio buttons and the Color property of the Window background brush. Our goal here is to have the window color change depending on which radio button is selected.
Now we can head to the view model
MainWindowViewModel.cs and add the properties that we’ve specified bindings for. We’ll add a boolean property for each radio button and a color property for the window background. We’ll then add a list of four colors that we’ll use to change the background color depending on which radio button is selected. To change the background when a radio button is selected, we can add code to the
IsChecked setters to swap the background color with one from our list of colors.
If we run the app, we’ll see the four radio buttons and can change the window color by clicking on any of them.
Now we’re going swap those clicks for voice commands. We’ll use Porcupine to detect which of the words we’ve said and then select the appropriate radio button. Add the Porcupine NuGet package to the project and create an instance of the wake word engine like so:
To detect the keywords, we need to capture frames of audio and pass them to Porcupine for processing. A great open-source, cross-platform audio library is OpenAL, which can be accessed through the OpenTK NuGet package. We’ll add that to our project and add some code to capture frames of audio for Porcupine using the default audio capture device. Each frame is passed to Porcupine for processing and an index is returned. If the index is -1, then no wake word was detected, but if the keyword index is 0 to 3, then one of our four wake words was detected. To select a radio button when the associated wake word is detected, we call the setter for that radio button’s IsChecked property. Finally, we’ll put the whole thing on a separate thread so that it doesn’t block the UI. Now our view model constructor will look like this:
Launch the app and you should now be able to say any of the four commands and the radio button will select without any mouse input.
And there you have it — you’ve now added voice commands to an app without changing any of the UI code. In a real-world app, we could attach commands for “Save”, “Open File” or “Settings” and enable more complex inputs with Picovoice’s Rhino Speech-to-Intent engine. Adding this new dimension of interaction to your app has tremendous value and could be the feature that gives your app the edge in a crowded marketplace.
The full source code for this project can be found here.