Building an Alexa skill with .NET and Azure

Published in

Corrado Cavalli

12 min readDec 29, 2018

The whole story behind my Meteosat Alexa skill.

I admit, I am not a fan of voice activated devices, while I understand they will rule the world in a non-distant future, I always feel a bit stupid when I have to talk to something is not “alive”, in spite of this, thanks to Amazon special promotional offer, the geek in me decide to buy an Alexa device and since I wanted to have a more complete experience I bought an Amazon Echo Spot, the one that also includes a 2.5" screen.

Setup was immediate, voice recognition works extremely well and had fun installing some skills and playing with them, but after a while the developer in me felt the need of trying to be part of this new ecosystem and so I decided that it was time to write my first skill for Alexa and the first problem was: what kind of skill should I write?
Honestly excluding the famous branded ones, the rest of the Amazone store is filled with same, boring, repetitive apps an I wanted to create one I would use not only for some development experiments but also in my day to day use.

Since I have a display enabled device and I often look at Meteosat images to have a better weather overview I decided to create an app that displays the latest images of the MeteoSat satellite.
A challenge to myself since I knew absolutely nothing about skill development.

Where do I start?

The first step was to register as Amazon developer at Amazon Alexa portal using my Amazon account, you should create one if you don’t have one yet, there I found all the necessary resources to become a “professional” Alexa developer.

If, like me, your goal is to become indeed a .NET Alexa developer, unfortunately you won’t find any specific.NET info on the portal, but what is important is that we now have the access to the Alexa portal and we can start creating our first skill.
Let’s navigate here and click the Create Skill button, let’s enter the Skill name and Default language (i started my skill with just Italian for simplicity but multiple language support is planned…)
The portal allows you to select between some preconfigured build models, in my case, since I choose for the the maximum flexibility I selected Custom.
After that, click the Create Skill button again to confirm.

You will be now redirected to the skill console, on top, you will see the skill creation steps and on the right the Skill builder checklist, let’s now dig into the requires steps to implement.

Step 1: Invocation Name

In this section you have to enter the command the user will say to activate the skill, in my case user will have to say “Alexa, open vista meteo sat” to activate it.
I had to split metosat in two words (meteo and sat) otherwise sometimes the word “Meteosat” was not recognized by Alexa probably because of Italian language setting.
Important: The first time I submitted the skill i just used “meteosat” as invocation name but it was rejected because rules say that invocation name must be two or more words, so choose it carefully.
Don’t forget to click Save Model button when done with a step.

Step 2: Intents, Samples and Slots

In this step you can define extended intents (voice commands) parameters recognized by the skill, since mine doesn’t use any of them, I just ignored this section.
In case you want to know more about this just read here.

Step 3: Build the model

Each time you change the interaction model, as example, because you change some intents, you have to build the model, via the Build Model button available on top of each section.
When done, a confirmation message appears at bottom right corner.

Step 4: Endpoint

This is the most important step of the entire skill configuration, because, in the end, an Alexa device is nothing more than a very clever machine that issues voice activated POST calls to a remote uri, using a predefined JSON scheme, expecting a well defined JSON response and processing received information locally.
In order to have this clever mechanism work properly is crucial to have a remote endpoint that the skill can reach, send the information requested by the user, have them server side processed and have results returned back to the device for representation, in text, audio or video form.

We can use whatever technology we like on server side, once the JSON schema is respected, Alexa doesn’t care if server uses Azure, AWS Lambda (while warmly encouraged) or any other server side host, just the contract and processing time are mandatory.

Using Azure Functions

In my case, since, in the end, this was nothing more than an experiment to me, I decided to use the smarter (and cheaper) server side infrastructure available so I opted for server less computing with Azure Functions

When i wrote the skill initially I have used Visual Studio’s Azure Functions template and set it up accordingly, but now, Microsoft MVP’s Marco Minerva created an Alexa Skill Project Template that you can use to setup the entire project based on Azure Functions, so if you like the easy way (why not?) go use it.

The entire project of Meteosat skill is available on GitHub and it differs from the one produced by the previously mentioned template, but overall concepts remain the same.
As said, Alexa just issues POST web requests and expects responses using a predefined JSON schema, so, as you can imagine, there is a lot of JSON/Schema related effort, luckily Tim Heuer (Principal Manager, Developer Relations at Microsoft) took care of all this and created an Alexa.Net nuget package that makes dealing with all Alexa stuff fairly easy, so, after creating the function skeleton project, I have added it to the solution.
Here’s how MeteoSat skill solution looks like:

How do I debug the Azure function?

Let’s say for a moment that we wrote some initial processing code and we want to test it, how can we do it?
Alexa portal lets you specify the endpoint to invoke and so one way would be to publish our function on Azure, add the function endpoint insideAlexa portal and then attach Visual Studio debugger to the remotely hosted function for eventul debugging, but all this process is very tedious.
Thanks to my friend Matteo Pagani i discovered a tool that really changed my life, is called ngrok.
What ngrok does, once run locally is to expose a local machine port to the public internet making it available to the entire world outside, including your Alexa device and portal tester! (I felt in love with this tool, I admit)

So, the steps i followed were:
-Register at ngrok.io and download the tool.
-Run the Azure Function locally from Visual Studio.
-Make note of local port used by the function.
-Run ngrok locally and make that port available on the internet.
-Add the ngrok temporary address into Alexa portal and save the model.
-Use Alexa developer test session to invoke the function and debug it.

As you see below, my Azure Function listen at /api/AlexaMeteosat at port 7071

Typing: ngrok http 7071 i made port 7071 publicly available, at https://4b2a9895.ngrok.io that maps to localhost:7071 so now everyone invoking https://4b2a9895.ngrok.io/api/AlexaMeteosat can reach my local function.

I then went back to Alexa portal and indicated this address as endpoint, together with specifying that I won’t use AWS services but another HTTPs service and the the endpoint is a sub-domain of a domain that as a wildcard certificate.

Note: don’t forget to click Save Endpoint button at the top after editing.

Testing the skill

On Alexa developer portal, click the Test link at the top to enter the Test section, from it you can test the skill without a real device even if I strongly recommend to do some additional tests with real one before submission.

To test my skin, with Visual Studio running, i just keep pressed the mic icon (yes, you can use your own voice for testing) and said “Alexa, apri vista meteo sat” (the expect activation intent) and I can see my local Azure Function within Visual Studio getting invoked and any eventual breakpoint hit, no Azure deployment required.

Note: Alexa expects to receive an answer within a specific amount of time, if, maybe just because you are debugging it, response takes longer than allotted time, it will timeout with a voice error message.

Publish the Azure Function

Since i presume you don’t want to have Visual Studio and your machine running all the time in order to have your skill fully functional is time to publish the Azure Function to Azure, this is indeed a simple step, is just a matter of right click the solution and select Publish… menu, this will open the Publish wizard and, from it, just click Publish button again.
From it, you can also read the official publication address of the function and use it to update the Alexa endpoint section so that any further invocation will be posted to the Azure address and no longer to ngrok temporary address.

Time to send the skill to certification!

Once your skill is ready and tested is time to distribute it and send it to certification, at least with my personal experience, be patient, it will take more that stated 5 days.
To distribute it go to Distribution section of the portal and fill required fields, just a note here since I got my skill rejected once for this: In the example Phrase 101 don’t use single nor double quotes

Inside Distribution section, inside Availability tab, you can also allow immediate access to some beta testers of your skill.

Certification

You now reached the final step before the glory: validate the skill before official submission.
Just click the Run button inside Validation first and Functional test later to run some acceptance tests against your skill, if everything goes well you’re ready to submit the skill for certification, otherwise you have to fix reported issues first and try again
A common issue that might rise during validation phase, even if everything works fine during test phase is following one:

This error occurs since we are not hosting the service inside AWS Lambda and validation page clearly indicates that service must validate that requests are coming from Alexa and not from an unknown origin and so we have to include some validation code as soon as a request is received server side.
Thanks to Alexa.Net support implementing this part is quite straightforward (see About the code section later for details)

After adding validation code, validation process should be ok and the skill is ready for the long (and slow) way to public success.

About the code

Since this a developer related blog I assume you want to know more about how the code works right?
Let’s take the service code and let’s analyze it in detail (I suggest you to clone the repository and open it in Visual Studio if you’re really curious about it)

public static async Task<IActionResult> Run([HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = null)]
            HttpRequest req, TraceWriter log)
        {
            string json = await req.ReadAsStringAsync();
            SkillRequest skillRequest = JsonConvert.DeserializeObject<SkillRequest>(json);
 
#if DEPLOY
            bool isValid = await Alexa.ValidateRequest(req, skillRequest);
            if (!isValid)
            {
                return new BadRequestResult();
            }
#endif
 
            //We check if invoking device supports diplay
            if (!skillRequest.Context.System.Device.IsInterfaceSupported("Display"))
            {
                var notSupportedResponse = ResponseBuilder.Tell("Mi spiace, questa skill é supportata solo da dispositivi muniti di schermo.");
                return new OkObjectResult(notSupportedResponse);
            }
 
            var requestType = skillRequest.GetRequestType();
 
            SkillResponse response = null;
 
            if (requestType == typeof(LaunchRequest))
            {
                response = Alexa.CreateResponse(ViewMode.Normal);
            }
            else if (requestType == typeof(IntentRequest))
            {
                var intentRequest = skillRequest.Request as IntentRequest;
                switch (intentRequest.Intent.Name)
                {
                    case "infrared":
                        response = Alexa.CreateResponse(ViewMode.Infrared);
                        break;
                    case "normal":
                    case "AMAZON.NavigateHomeIntent":
                        response = Alexa.CreateResponse(ViewMode.Normal);
                        break;
                    case "rain":
                        response = Alexa.CreateResponse(ViewMode.Rain);
                        break;
                    case "snow":
                        response = Alexa.CreateResponse(ViewMode.Snow);
                        break;
                    case "AMAZON.StopIntent":
                    case "AMAZON.CancelIntent":
                        response = Alexa.CreateGoodbyeResponse();
                        break;
                    case "AMAZON.HelpIntent":
                        response = Alexa.CreateHelpResponse();
                        break;
                    case "AMAZON.NextIntent":
                        response = Alexa.CreteScrollResponse(false);
                        break;
                    case "AMAZON.PreviousIntent":
                        response = Alexa.CreteScrollResponse(true);
                        break;
                    default:
                        response = ResponseBuilder.Empty();
                        response.Response.ShouldEndSession = false;
                        break;
                }
            }
            else if (requestType == typeof(SessionEndedRequest))
            {
                response = Alexa.CreateGoodbyeResponse();
                response.Response.ShouldEndSession = true;
            }
 
            return new OkObjectResult(response);
        }

The [FunctionName(“AlexaMeteosat”)] attribute indicates that the function is reachable at [address]/api/AlexaMeteoSat endpoint.

Following code deserializes the request into a SkillRequest object (thanks to Alexa.NET package) and then ensures that requests comes from an Alexa device invoking ValidateRequest method, if not valid, stops processing and returns a BadRequest.

private static async Task<bool> ValidateRequest(HttpRequest request, SkillRequest skillRequest)
        {
            request.Headers.TryGetValue("SignatureCertChainUrl", out var signatureChainUrl);
            if (string.IsNullOrWhiteSpace(signatureChainUrl))
            {
                return false;
            }
 
            Uri certUrl;
            try
            {
                certUrl = new Uri(signatureChainUrl);
            }
            catch
            {
                return false;
            }
 
            request.Headers.TryGetValue("Signature", out var signature);
            if (string.IsNullOrWhiteSpace(signature))
            {
                return false;
            }
 
            request.Body.Position = 0;
            var body = await request.ReadAsStringAsync();
            request.Body.Position = 0;
 
            if (string.IsNullOrWhiteSpace(body))
            {
                return false;
            }
 
            bool valid = await RequestVerification.Verify(signature, certUrl, body);
            bool isTimestampValid = RequestVerification.RequestTimestampWithinTolerance(skillRequest);
 
            if (!isTimestampValid)
            {
                valid = false;
            }
 
            return valid;
        }

Since this skill returns a collection of images, is totally useless on devices that don’t have display, so by using IsInterfaceSupported(“Display”) the code queries for display capability and if missing, using ResponseBuilder.Tell (also this courtesy of Alexa.NET library) return a response that makes Alexa device say that skill is not compatible with that device.
If request is of type LaunchRequest, meaning is the first time skill is activated, CreateResponse method is invoked.

private static SkillResponse CreateResponse(ViewMode mode)
        {
            string text = null;
            string url = null;
 
            string help = "Puoi dire aiuto per conoscere le opzioni disponibili.";
 
            switch (mode)
            {
                case ViewMode.Normal:
                    text = $"Ecco le ultime immagini dal satellite meteosàt, {help}";
                    break;
                case ViewMode.Infrared:
                    text = $"Ecco le ultime immagini all' infrarosso dal satellite meteosàt, {help}";
                    break;
                case ViewMode.Rain:
                    text = $"Ecco le ultime immagini radar pioggia dal satellite meteosàt, {help}";
                    break;
                case ViewMode.Snow:
                    text = $"Ecco le ultime immagini radar neve dal satellite meteosàt, {help}";
                    break;
                default:
                    throw new ArgumentOutOfRangeException(nameof(mode), mode, null);
            }
 
            SkillResponse response = ResponseBuilder.Tell(text);
            DisplayRenderTemplateDirective display = new DisplayRenderTemplateDirective();
 
            var bodyTemplate = new ListTemplate2
            {
                Title = "Immagini meteosat",
                BackButton = "HIDDEN"
            };
 
            foreach (KeyValuePair<string, string> info in infos)
            {
                var image = new TemplateImage() { ContentDescription = $"Vista {info.Key}" };
 
                switch (mode)
                {
                    case ViewMode.Normal:
                        url = "https://api.sat24.com/mostrecent/{info.Value}/visual5hdcomplete";
                        break;
                    case ViewMode.Infrared:
                        url = "https://api.sat24.com/mostrecent/{info.Value}/infraPolair";
                        break;
                    case ViewMode.Rain:
                        url = "https://api.sat24.com/mostrecent/{info.Value}/rainTMC";
                        break;
                    case ViewMode.Snow:
                        url = "https://api.sat24.com/mostrecent/{info.Value}/snow";
                        break;
                    default:
                        throw new ArgumentOutOfRangeException(nameof(mode), mode, null);
                }
 
                image.Sources.Add(new ImageSource()
                {
                    Url = url,
                    Height = 615,
                    Width = 845,
                });
 
                ListItem item = new ListItem
                {
                    Image = image,
                    Content = new TemplateContent
                    {
                        Primary = new TemplateText()
                        {
                            Text = $"{info.Key}",
                            Type = "PlainText"
                        }
                    }
                };
 
                bodyTemplate.Items.Add(item);
            }
 
            display.Template = bodyTemplate;
            response.Response.Directives.Add(display);
            response.Response.ShouldEndSession = false;
            return response;
        }

This is a general method that takes care of generating the correct skill response, since we are targeting display enabled devices, a specific DisplayRenderTemplateDirective must be added to the response together with the indication of the template used for images rendering, ListTemplate2 gallery is the one used by Meteosat skill.
Each ListItem added to the list template includes an Image url and some optional text as stated into documentation.
MeteoSat images are kindly provided by sat24.com site

The code just creates the proper template, add the RenderTemplate directive and returns the formatted response.
Optional additional supported intents requests like infrared, rain, snow are processed by the same CreateResponse method or, for some built in ones like AMAZON.StopIntent (the one sent when user says “Alexa, Stop”)are processed by dedicated response processor like the easy CreateGoodbyeResponse that just returns a goodbye message.

private static SkillResponse CreateGoodbyeResponse()
        {
            return ResponseBuilder.Tell("Arrivederci!");
        }

Recap

Hope you got an overall idea of how to start playing with Alexa skills using .NET and Azure technology. In the end, as said, is nothing more than handling a POST request and react accordingly, probably the most complicated part of all the process is to find the really killing idea, as usual.
If you have an Echo Spot device and wants to see the skill in action, is available here.

References

Matteo Pagani wrote a series of very interesting posts regarding Alexa development, I highly encourage you to have a read if you want to know more about it.