How to Bring Voice to your Unity Games With Houndify Voice AI

Daniel Core
Houndify
Published in
6 min readFeb 14, 2020

Gaming is moving from a hand-held experience to a hands-free encounter with characters and worlds formerly only accessible through a controller or keyboard. In the future, all games will have a voice user interface that will heighten user experiences and advance the gaming industry beyond what is available today.

To make the integration of Houndify into your games easier, we’re providing a step-by-step guide for streaming audio, sending text requests, and connecting both of those actions with custom commands in games.

In this tutorial, we’ll explore the Mystery at the Hound Casino to get a better understanding of how to utilize Houndify to integrate voice control into your game. Get ready to solve the Mystery at Hound Casino!

Before getting started with a voice user interface

For this tutorial, you will need a basic understanding of:

  • Unity
  • C#

Make sure to have Unity installed so that you can edit and run the game.

If you haven’t already created an account head to Houndify and follow the instructions to get started for free.

After creating your Houndify account, clone the example project from here and import it into Unity.

Step one: Configure your Houndify client

To get started, create your Houndify client so that you can connect the Unity game to Houndify.

After creating the client, enable the Query Glue domain. If you are signed up for a paid or promo account, enable the Selvy TTS Voice Collection for server-side TTS voice playback. Any domain can be used with the C# SDK, so you can be as creative as you want with your apps.

Step two: Enable custom commands

After enabling these domains, go to the main page of your client and select Custom Commands on the left panel.

Click on Enable Custom Commands. Then click on the New Page button to the right.

Step three: Add game characters

In this game, one of the characters will respond “Yes” and the other will respond “No” to the same voice query, “Are you the killer?”

To enable this, you will be adding two pages, one for Anna and one for Bob. First, click the new page button and enter Anna for the name. After that, you will be prompted to create a custom command. Write the phrase “are you the killer” in the Expression field. This will allow Houndify to recognize your custom query. In the Result JSON field, you can specify any JSON data that you want to be returned. In the Response field, enter the response that will be returned as both text and speech.

This is how Anna’s page will look:

Make sure to set the default matching behavior to false after saving the custom command. After making the page for Anna, you are ready to move on to Bob — the guilty character.

This is how Bob’s page will look:

Step four: Create an initial response

In the beginning, the game will make a text request with the phrase “speak_this”. This request is used to return the intro of the game.

The full text is:

“Welcome to Hound Casino the most famous casino in town. Everything seemed like a normal week when on a Thursday night screams are heard in the poker room. Newspaper tycoon Tom Bride falls dead from his chair. Anna and Bob are the prime suspects. It’s up to you to find out who was responsible for Tom’s death”.

To create the response to this query, click on New Page again to create a page.

Next, create a Custom Command in this page. It should look like this:

Make sure to set your page as Default On so the page name doesn’t need to be passed in the RequestInfo.

After your client is fully configured, copy your client key and client ID. To get this info, click on Overview & API Keys in the left panel

Step five: Set-up the Unity app

After downloading and opening in Unity, set the Client ID and Client Key into the Settings.cs file inside the Assets/Settings/Scripts folder.

The game consists of one scene with sprites for the background and characters. The key parts of the scene are the buttons which trigger a common action that sends the right information to Houndify. We’ll leave the details of UI design to be explored in the Unity editor after importing.

Step six: Integrating with Houndify

For the purposes of this tutorial, we will focus on the integration with the Houndify Voice AI platform.

Houndify supports both audio and text:

Streaming Audio Requests: These pass audio to Houndify and receive partial transcripts throughout the transmission.

Text requests: Use these for things like TTS — in this case are used to generate the game intro. However, any domain can be used with text requests.

Audio requests with the Houndify platform

Audio requests are the focus of Houndify. Through our SDKs, we make it easy to use in your app. There are a few different parts of a successful audio query. Sending the audio, getting the partial transcripts, and stopping the audio when the server detects the end of speech.

In the button.cs script the start of the request is tied to the click of the Ask button.

Sending Audio

The following snippet listens for the click on the Ask button and kicks off the Record function which will start the sending process.

public void OnPointerDown()
{
GameObject.Find("Button Speak").GetComponentInChildren<UnityEngine.UI.Text>().text = "Listening";
StartCoroutine("Record");
}

In the Record Coroutine, the request is initiated and status is monitored and then the results are returned. The start_voice_request() method will start streaming the audio captured from the microphone device and send it to Houndify where it is processed.

IEnumerator Record()
{
character_pages.initValue();...m_clip = Microphone.Start(Microphone.devices[0], true, 1000, 16000);
lastPos = 0;
// Create a HoundRequester.VoiceRequest object
lph.bt = this;
request = requester.start_voice_request(null, request_info, lph);
sending = true;
stopRecording = false;
while (Microphone.IsRecording(Microphone.devices[0]))
{
yield return null;
}
sending = false;
hound_result = request.finish();

Now that we are sending audio to the server, how do we monitor progress?

Partial transcripts and stopping audio

The LocalPartialHandler class is called if there is an update from the HoundRequester that is sending the audio to the Houndify server. This will happen when the transcript changes and when the server-side audio detection finds the end of the user’s speech. When the partial.getSafeToStopAudio() is true, the server has detected the end of speech and it is safe to stop the recording.

class LocalPartialHandler : HoundRequester.PartialHandler
{
private bool show_transcript;
public button bt;
public LocalPartialHandler(bool init_show_transcript)
{
show_transcript = init_show_transcript;
}
// The handle method is called whenever a partial transcript is received by the client
public override void handle(HoundPartialTranscriptJSON partial)
{
if (show_transcript)
{ {
Debug.Log(partial.getPartialTranscript());
if (partial.getSafeToStopAudio())
{
bt.stopRecording = true;
}
}
}
}

Text Requests

Text requests can be used for any domain that is enabled as an alternative to voice. In an app like the Hound assistant, that could be a query about the weather or a hotel. In the Hound Casino app, it is utilized in combination with custom commands to generate the audio for the game intro.

Sending the request:

Below the do_text_request() method takes the request info and the text to send to Houndify. The return can then be processed to get the audio.

HoundServerJSON hound_result;
hound_result = requester.do_text_request("say_this", null, request_info);
CommandResultJSON my_answer = hound_result.getAllResults()[0];
string bytes_audio = my_answer.getResponseAudioBytes();
byte[] bytes = System.Convert.FromBase64String(bytes_audio);

Playback

The following gets the audio source then passes the WAV that is returned from Houndify to the source and plays it back.

audioSource = GetComponent<AudioSource>();Debug.Log("Intro started...");WAV wav = new WAV(bytes);
AudioClip audioClip = AudioClip.Create("testSound", wav.SampleCount, 1, wav.Frequency, false, false);
audioClip.SetData(wav.LeftChannel, 0);
audioSource.clip = audioClip;
audioSource.Play();

Now you are ready to Run your game and solve the mystery by clicking a character and either asking them “are you the killer” or blaming them for the crime.

Have fun and remember: “We all make choices in life, but in the end our choices make us.” Andrew Ryan, BioShock.

--

--