Experiments with HoloLens, Bot Framework, LUIS and Speech Recognition

Recently I had the opportunity to use a HoloLens device for some personal training and building some simple demos.

One of the scenarios that I find very intriguing is the possibility of integrating Mixed Reality and Artificial Intelligence (AI) in order to create immersive experiences for the user.

I decided to perform an experiment by integrating a Bot, Language Understanding Intelligent Services (LUIS), Speech Recognition and Mixed Reality via a Holographic 2D app.

The idea was to create a sort of “digital assistant” of myself that can be contacted using Mixed Reality: the first implementation contains only basic interactions (answering questions like “What are your favourite technologies” or “What’s your name”) but these could be easily be expanded in the future with features like time management (via the Graph APIs) or tracking projects status, etc.

Creating the LUIS application

To start, I created a new LUIS application in the portal with a list of intents that needed to be handled:

In the future, this could be further extended with extra capabilities.

After defining the intents and utterances, I trained and published my LUIS app to Azure and copied the key and URL for usage in my Bot:

Creating the Bot

I proceeded with the creation of the Bot using Microsoft Bot framework downloading the Visual Studio template and creating a new project:

The Bot template already defined a dialog named RootDialog so I extended the generated project with the classes required for parsing the JSON from the LUIS endpoint:

public class LUISDavideBot
{
public static async Task<DavideBotLUIS> ParseUserInput(string strInput)
{
string strRet = string.Empty;
string strEscaped = Uri.EscapeDataString(strInput);
using (var client = new HttpClient())
{
string uri = "<LUIS url here>" + strEscaped;
HttpResponseMessage msg = await client.GetAsync(uri);
if (msg.IsSuccessStatusCode)
{
var jsonResponse = await msg.Content.ReadAsStringAsync();
var _Data = JsonConvert.DeserializeObject<DavideBotLUIS>(jsonResponse);
return _Data;
}
}
return null;
}
}
public class DavideBotLUIS
{
public string query { get; set; }
public Topscoringintent topScoringIntent { get; set; }
public Intent[] intents { get; set; }
public Entity[] entities { get; set; }
public Dialog dialog { get; set; }
}
public class Topscoringintent
{
public string intent { get; set; }
public float score { get; set; }
public Action[] actions { get; set; }
}
public class Action
{
public bool triggered { get; set; }
public string name { get; set; }
public object[] parameters { get; set; }
}
public class Dialog
{
public string contextId { get; set; }
public string status { get; set; }
}
public class Intent
{
public string intent { get; set; }
public float score { get; set; }
public Action1[] actions { get; set; }
}
public class Action1
{
public bool triggered { get; set; }
public string name { get; set; }
public object[] parameters { get; set; }
}
public class Entity
{
public string entity { get; set; }
public string type { get; set; }
public int startIndex { get; set; }
public int endIndex { get; set; }
public float score { get; set; }
}

And then processed the various LUIS intents in RootDialog (another option is the usage of the LuisDialog and LuisModel classes as explained here):

[Serializable]
public class RootDialog : IDialog<object>
{
public Task StartAsync(IDialogContext context)
{
context.Wait(MessageReceivedAsync);
return Task.CompletedTask;
}
private async Task MessageReceivedAsync(IDialogContext context, IAwaitable<object> result)
{
var activity = await result as Activity;
if (activity != null)
{
await ProcessLuis(activity, context);
}
context.Wait(MessageReceivedAsync);
}
private async Task ProcessLuis(Activity activity, IDialogContext context)
{
if (string.IsNullOrEmpty(activity.Text))
{
return;
}
var errorResult = "Try something else, please.";
var stLuis = await LUISDavideBot.ParseUserInput(activity.Text);

var intent = stLuis?.topScoringIntent?.intent;
if (!string.IsNullOrEmpty(intent))
{
switch (intent)
{
case "FavouriteTechnologiesIntent":
await context.PostAsync("My favourite technologies are Azure, Mixed Reality and Xamarin!");
break;
case "FavouriteColorIntent":
await context.PostAsync("My favourite color is Blue!");
break;
case "WhatsYourNameIntent":
await context.PostAsync("My name is Davide, of course :)");
break;
case "HelloIntent":
await context.PostAsync(@"Hi there! Davide here :) This is my personal Bot. Try asking 'What are your favourite technologies?'");
return;
case "GoodByeIntent":
await context.PostAsync("Thanks for the chat! See you soon :)");
return;
case "None":
await context.PostAsync(errorResult);
break;
default:
await context.PostAsync(errorResult);
break;
}
}
return;
}
}

Then, I tested the implementation using the Bot Framework Emulator:

And created a new Bot definition in the framework portal.

After that, I published it to Azure with an updated Web.config with the generated Microsoft App ID and password:

Since the final goal was the communication with an UWP HoloLens application, I enabled the Direct Line channel:

Creating the Holographic 2D app

Windows 10 UWP apps are executed on the HoloLens device as Holographic 2D apps that can be pinned in the environment.

I created a new project using the default Visual Studio Template:

And then added some simple text controls in XAML to receive the input and display the response from the Bot:

<Page
x:Class="HoloLensBotDemo.MainPage"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="d">
<Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}">
<Grid.ColumnDefinitions>
<ColumnDefinition Width="10"/>
<ColumnDefinition Width="Auto"/>
<ColumnDefinition Width="10"/>
<ColumnDefinition Width="*"/>
<ColumnDefinition Width="10"/>
</Grid.ColumnDefinitions>
<Grid.RowDefinitions>
<RowDefinition Height="50"/>
<RowDefinition Height="50"/>
<RowDefinition Height="50"/>
<RowDefinition Height="Auto"/>
</Grid.RowDefinitions>
<TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" />
<TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/>
<Button Content="Start Recognition" Click="Button_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" />
<TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" />
<TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/>
<TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" />
<TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3"
VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3"
TextWrapping="Wrap" />
</Grid>
</Page>

I decided to use the SpeechRecognizer APIs for receiving the input via voice (another option could be the usage of Cognitive Services):

private async void Button_Click(object sender, Windows.UI.Xaml.RoutedEventArgs e)
{
TextStatus.Text = "Listening....";

// Create an instance of SpeechRecognizer.
var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();
// Compile the dictation grammar by default.
await speechRecognizer.CompileConstraintsAsync();
// Start recognition.
Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
TextCommand.Text = speechRecognitionResult.Text;
await SendToBot(TextCommand.Text);
}

The SendToBot() method makes use of the Direct Line APIs which permit communication with the Bot using the channel previously defined:

private async Task SendToBot(string inputText)
{
if (!string.IsNullOrEmpty(inputText))
{
var directLine = new DirectLineClient("<Direct Line Secret Key here>");
var conversation = await directLine.Conversations.StartConversationWithHttpMessagesAsync();
var convId = conversation.Body.ConversationId;
var httpMessages = await directLine.Conversations.GetActivitiesWithHttpMessagesAsync(convId);
var command = inputText;
TextStatus.Text = "Sending text to Bot....";
var postMessage = await directLine.Conversations.PostActivityWithHttpMessagesAsync(convId,
new Activity()
{
Type = "message",
From = new ChannelAccount()
{
Id = "Davide"
},
Text = command
});
var result = await directLine.Conversations.GetActivitiesAsync(convId);
if (result.Activities.Count > 0)
{
var firstOrDefault = result
.Activities
.FirstOrDefault(a => a.From != null
&& a.From.Name != null
&& a.From.Name.Equals("Davide Personal Bot"));
if (firstOrDefault != null)
{
TextOutputBot.Text = "Bot response: " + firstOrDefault.Text;
TextStatus.Text = string.Empty;
}
}
}
}

And then I got the app running on HoloLens and interfacing with a Bot using LUIS for language understanding and Speech recognition:

The source code of the project is available on GitHub here.

Happy coding!

Originally published at http://www.davidezordan.net/blog/?p=8130

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.