Experiments with HoloLens, Bot Framework and LUIS: adding text to speech

Previously I blogged about creating a Mixed Reality 2D app integrating with a Bot using LUIS via the Direct Line channel available in the Bot Framework.

I decided to add more interactivity to the app by also enabling text to speech for the messages received by the Bot: this required the addition of a new MediaElement for the Speech synthesiser to the main XAML page:

<Page
x:Class="HoloLensBotDemo.MainPage"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="d">
<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
<Grid.ColumnDefinitions>
<ColumnDefinition Width="10"/>
<ColumnDefinition Width="Auto"/>
<ColumnDefinition Width="10"/>
<ColumnDefinition Width="*"/>
<ColumnDefinition Width="10"/>
</Grid.ColumnDefinitions>
<Grid.RowDefinitions>
<RowDefinition Height="50"/>
<RowDefinition Height="50"/>
<RowDefinition Height="50"/>
<RowDefinition Height="Auto"/>
</Grid.RowDefinitions>
<TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" />
<TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/>
<Button Content="Start Recognition" Click="StartRecognitionButton_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" />
<TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" />
<TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/>
<TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" />
<TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3"
VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3"
TextWrapping="Wrap" />
<MediaElement x:Name="media" />
</Grid>
</Page>

Then I initialized a new SpeechSynthesizer at the creation of the page:

public sealed partial class MainPage: Page
{
private SpeechSynthesizer synthesizer;
private SpeechRecognizer recognizer;
public MainPage()
{
this.InitializeComponent();
InitializeSpeech();
}
private async void InitializeSpeech()
{
synthesizer = new SpeechSynthesizer();
recognizer = new SpeechRecognizer();
media.MediaEnded += Media_MediaEnded;
recognizer.StateChanged += Recognizer_StateChanged;
// Compile the dictation grammar by default.
await recognizer.CompileConstraintsAsync();
}
private void Recognizer_StateChanged(SpeechRecognizer sender, SpeechRecognizerStateChangedEventArgs args)
{
if (args.State == SpeechRecognizerState.Idle)
{
SetTextStatus(string.Empty);
}
if (args.State == SpeechRecognizerState.Capturing)
{
SetTextStatus("Listening....");
}
}
…….

And added a new Speech() method using the media element:

private async void Speech(string text)
{
if (media.CurrentState == MediaElementState.Playing)
{
media.Stop();
}
else
{
try
{
// Create a stream from the text. This will be played using a media element.
SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text);
// Set the source and start playing the synthesized audio stream.
media.AutoPlay = true;
media.SetSource(synthesisStream, synthesisStream.ContentType);
media.Play();
}
catch (System.IO.FileNotFoundException)
{
var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable");
await messageDialog.ShowAsync();
}
catch (Exception)
{
media.AutoPlay = false;
var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text");
await messageDialog.ShowAsync();
}
}
}

When a new response is received from the Bot, the new Speech() method is called:

var result = await directLine.Conversations.GetActivitiesAsync(convId);
if (result.Activities.Count > 0)
{
var botResponse = result
.Activities
.LastOrDefault(a => a.From != null && a.From.Name != null && a.From.Name.Equals("Davide Personal Bot"));
if (botResponse != null && !string.IsNullOrEmpty(botResponse.Text))
{
var response = botResponse.Text;
TextOutputBot.Text = "Bot response: " + response;
TextStatus.Text = string.Empty;
Speech(response);
}
}

And then the recognition for a new phrase is started again via the MediaEnded event to simulate a conversation between the user and the Bot:

private void Media_MediaEnded(object sender, Windows.UI.Xaml.RoutedEventArgs e)
{
StartRecognitionButton_Click(null, null);
}

As usual, the source code is available for download on GitHub.

Originally published at https://www.davidezordan.net/blog/?p=8187

Like what you read? Give Davide Zordan a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.