Talking Chess: Adding Offline, Cross-Platform Voice Controls to Chess in .NET Core
For years, developers looking to add speech recognition to applications have faced a daunting task. They either open their application up to big speech recognition providers in the cloud or, they spend weeks integrating a subpar offline approach. At Picovoice, we are shifting this paradigm. Our goal at Picovoice is to give developers the power to design, build and integrate offline voice interfaces in as little time as possible.
Since developer experience is paramount to us, it’s important that we frequently put ourselves in the shoes of a developer using our SDKs. To see if our developer experience is as seamless as we claim, I recently decided to use our new .NET SDK to build a voice interface for a game. A lot of people are talking chess right now thanks to Netflix’s new hit series Queen’s Gambit, so I thought it would be a good idea to make chess you can talk to!
To accurately simulate this experience, I had to impose some constraints:
- Use .NET Core, but keep it cross-platform (should be able to play on Windows, Linux and Mac)
- Use Picovoice NuGet package
- Use Picovoice Console to design my voice commands
- Invest no more than a couple of hours of my time
From these constraints, you can probably infer that I didn’t want to build a chess game; no, I just wanted to integrate a voice interface for one. Luckily, I found a great open-source project on Github called ChessCore that implemented a chess engine in .NET Core, was cross-platform and had a text-based interface. Perfect — time to get to work!
Out With the Old, In With the New
Conveniently, ChessCore has kept the chess-playing engine completely separate from the interface. This allowed me to remove the text-based interface and replace it with an entirely voice-controlled one.
We don’t want to throw out everything from this interface, however. First I want to extract the functions that execute/validate player moves, translate moves to the coordinate system used by the engine and draw the chessboard to the console. After extracting all of the useful items from the original Program.cs, we have something that looks like this:
Now we need to fill in
RunGame() with our new voice interface — but first, we need to design our voice experience in the Picovoice Console.
Designing a Voice UI with the Picovoice Console
First, we’ll want a wake word for our game — something that is always listened for and serves to open the gate to our voice command interface. A wake word is a great way of communicating to a voice command interface that we want it to start listening for commands from incoming audio. Picovoice offers a highly-accurate wake word engine called Porcupine that accepts custom wake words designed and built in the Picovoice Console.
To make a custom wake word, we can go to the Porcupine section of the Picovoice Console and simply type in a phrase we want our game to respond to. For this game, I’ll use the wake word “Pico Chess”. You can then select what platform you want the wake word model to run on. To keep it cross-platform we’ll make one for Windows, macOS and Linux. After testing it with the microphone widget to ensure it responds well, I’ll click the Train Wake Word button to start training my custom wake word.
It’ll take about an hour for this to train — in the meantime we can head over to the Rhino section to design our command Context.
Rhino is Picovoice’s Speech-to-Intent Engine. What does this mean? Well — it is essentially a speech recognition engine that is focused on recognizing a defined set of commands within a specific context. In our case, our context is chess and we are looking to design a set of commands that allow us to play the game with only our voice.
To do this in the Picovoice Console, we’ll start by creating an Empty context called “Chess”. Looking at our chess game, we can identify a few discrete actions we want to allow players to execute with voice commands. We want them to be able to move their pieces, undo their last move, start a new game and quit the game. These actions we capture as Intents in our context:
These four intents will allow the user to play through multiple games entirely hands-free. Clicking on each intent will allow us to define the Expressions we want to allow the user to say when they want to trigger that intent. For newGame, undo and quit, these are easy:
newGame: “new game”
undo: “undo last move”
quit: “quit game”
For Move, we need to ramp up the complexity a bit. We know a move command has the format “Move [source] to [destination]”, but source and destination have numerous possibilities — this is where Slots come into play. Each slot serves as a variable that can be filled with several possibilities. For our chess game, we’re using descriptive notation, which has the format [side] [file] [rank] (e.g. “queen’s bishop one to king three”), so we have to create three slots to hold these variables. Our slots will have the following elements:
side: “king”, “king’s”, “queen”, “queen’s”
file: “rook”, “knight”, “bishop”
rank: “one”, “two”, …“eight”
Now we can go back the Move intent and fill it in the expressions using our newly created slots. In the expressions, we’ll make the words “move” and “to” optional. Upon completion, it’ll look something like this:
Here you’ll see that we give each slot an additional label (e.g.
srcSide) — you’ll see later in our code how we use this to retrieve the value of the slot.
With our context complete, we can test it with the microphone widget and then initiate the model training. Just like with the Porcupine model, when we click Train we’ll also have to choose which platform we want the model to run on — we’ll again make one for Windows, macOS and Linux. We can then go to the Models section and download our completed models. Once we get an email informing us that our wake words have finished training, we can head to the Porcupine section of the Picovoice Console to download those as well.
Chess to PicoChess
Now we’re ready to start integrating the Picovoice end-to-end platform into our chess game. The Picovoice platform is simply a combination of Porcupine and Rhino. Porcupine waits for a given wake word (“PicoChess”) and then passes the follow-on command (e.g. “move queen two to queen three”) to Rhino.
To add the Picovoice platform to our game we first need to add the Picovoice NuGet package and our model files to our ChessCore project. We’ll mark all the model files to be copied to the output directory so that Picovoice can find them easily.
Time to write some code! Importing the new package with
import Pv; will allow us to construct an instance of Picovoice. The constructor for Picovoice requires four main arguments:
keywordPath refers to the Porcupine wake word model file (.ppn), while
contextPath refers to the Rhino context model file (.rhn). The
wakeWordCallback is a function that we want to execute when Porcupine detects the given keyword, while
inferenceCallback is a function that will process the inference of the command that follows the wake word. Picovoice is constructed within a
using statement to ensure resources are released when the game is over.
Picovoice will handle switching between wake word detection and intent inference internally, so in
WakeWordCallback we can just let the user know that we’re now listening for a command phrase.
InferenceCallback will have much more to do; in this function, we will need to parse the
Inference object it receives from Rhino and execute the desired command. In our .NET SDK, the
Inference class has three immutable properties that you can access:
IsUnderstood: whether Rhino matched one of the commands or not
Intent: if understood, which intent was inferred
Slots: if understood, a dictionary with data relating to the intent
In our case, the intents for our chess game are
quit. Slots will be empty for every intent but
move, where you will find a dictionary containing coordinates of the source and destination of the move. This is where the variable names we gave to the slots come in — they serve as the keys to the dictionary, while the values of the dictionary will provide the coordinates of the move. With this in mind, our inference callback will look something like this:
Now we have an instance of Picovoice in our chess game with all the logic required to parse its results, but it has no way to hear us. Time to give PicoChess some ears!
Listen Up, PicoChess!
In .NET Core, cross-platform microphone control is tough. There are plenty of great Windows-only options out there, but one of our constraints is to keep it cross-platform — enter OpenAL (Open Audio Language). OpenAL comes as part of the OpenTK package and supports cross-platform audio capture — add this NuGet package to our project.
You’ll have to perform an additional step to install OpenAL if it’s not found on your system:
- On Windows, install using the OpenAL Windows Installer.
- On Linux use
sudo apt-get install libopenal-dev
- On macOS use
brew install openal-soft
With OpenAL, we can access the default audio capture device on the system and tell it to capture audio in the format Picovoice requires (mono, 16-bit, linearly encoded PCM). We’ll also tell it to record at the sample rate and audio frame length that is specified by Picovoice. As these frames come in, we’ll pass them to the Picovoice’s
Process(short pcm) function, which will pass it to either Porcupine or Rhino and activate one of the callbacks if one of the engines returns a result. Now our
RunGame() function will simply start the audio loop and feed audio frames to Picovoice until the user exits the game:
And that’s basically it! It took me all of two and a half hours to design, build and integrate a voice interface to this existing project and, unlike other voice solutions currently available for .NET Core, the Picovoice .NET SDK is offline and desktop cross-platform. Add the NuGet package to your project and give it a shot yourself!
You can find my fork of the ChessCore project, with all the relevant source code here.