Updating You Skill for Echo Show

From voice to visuals and back again

Published in

thirteen23

6 min readJul 11, 2017

Amazon has come out with the new Echo Show, an expansion to their smart speaker that gives you the ability to add a visual, interactive interface to your conversations with Alexa.

Given my recent recognition from Amazon as an Alexa Champion, I had the opportunity for a first look at the Echo Show and a chance to update my Alexa skills to support this new visual interface. I started with the ATX Dillo Skill, a skill that helps you find happy hours and events around Austin. This article describes that process and seeks to provide a starting point for updating your own skill to be supported on the Echo Show.

Meet Dillo

If you didn’t already know, Dillo is a fun, interactive way to explore happy hours and events around Austin, Texas, by using your Echo device. It will give you information on events according to the day of the week you give it, up to a week ahead. You can read more about Dillo here:

Part 2: Alexa, ask Dillo to Dig

In my story titled “Just the Facts, Alexa” I talked about my first experience building apps for the Amazon Echo. (If…

medium.com

Specific to happy hours, Alexa sometimes has some issues saying (very important) words such as “micheladas”, “sake”, and (very surprisingly) “beer”, but for the most part it gets the information across in an understandable manner. For instance, this:

I sure would like some drinks, here in Austin, Texas. Maybe a few micheladas, or perhaps an unfiltered sake? Or how about a nice cold beer. Not a bear in the woods, a beer to drink.

Reads like this:

Taking Dillo to the Show

In porting Dillo to the Echo Show, I have the unique chance to add eye candy to the experience. Luckily, the data provides images to display along with the event information.

Editing global fields

The first thing you need to do is go into the developer portal and make an update to your Global Fields under the Skill Information tab. You’ll see a new item called Render Template. Templates are the different styles of layout for rendering content on the Echo Show. Toggle this to “Yes”.

Build your template data for display

The next step is to build the Render Template JSON for displaying your content. You have multiple templates to choose from. For this example I am using the first one, BodyTemplate1. I have images for each event available to me from my data feed, so I want to use the “backgroundImage” property to display that image as a backdrop to the name and description of the venue/event. Something like this:

In order to support this template, I have to build those properties into my JSON response. A typical response that supports the regular audio-only version of the Echo looks like this:

{
  "version": "1.0",
  "response": {
    "outputSpeech": {
      "type": "PlainText",
      "text": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts"
    },
    "card": {
      "content": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts",
      "title": "Mohawk Austin",
      "type": "Simple"
    },
    "speechletResponse": {
      "outputSpeech": {
        "text": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts"
      },
      "card": {
        "title": "Mohawk Austin",
        "content": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts"
      },
      "shouldEndSession": false
    }
  },
  "sessionAttributes": {}
}

Including BodyTemplate1 for the Echo Show looks like this:

{
  "version": "1.0",
  "response": {
    "outputSpeech": {
      "type": "PlainText",
      "text": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts"
    },
    "card": {
      "content": "<font size='4'><b>Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts</b></font>",
      "title": "Mohawk Austin",
      "type": "Simple"
    },
    "speechletResponse": {
      "outputSpeech": {
        "text": "Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts"
      },
      "card": {
        "title": "Mohawk Austin",
        "content": "<font size='4'><b>Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts</b></font>"
      },
      "directives": [
        {
          "template": {
            "backButtonBehavior": "VISIBLE",
            "backgroundImage": {
              "contentDescription": "The Image",
              "sources": [
                {
                  "url": "https://example.com/barimage.png",
                  "size": "x-small",
                  "widthPixels": 0,
                  "heightPixels": 0
                }
              ]
            },
            "title": "Mohawk Austin",
            "textContent": {
              "primaryText": {
                "text": "<font size='4'><b>Happy Hour 5-8 PM: Over 60 Beers, 60 Spirits and 16 Drafts</b></font>"
              }
            }
          }
        }
      ],
      "shouldEndSession": false
    }
  },
  "sessionAttributes": {}
}

Notice the added “Directives” list. This is where we define a directive of type “Display.RenderTemplate” and a “BodyTemplate1” template. From there you include those properties that are supported in the response JSON for BodyTemplate1.

Also, notice that there are tags such as “font” and “b” for styling the text. Amazon supports a limited set of markup for styling text in your template.

The backend code for the ATX Dillo skill is hosted on Lambda using Node.js. Most likely, regardless of the language you wrote your skill in, it should be fairly easy to update in order to add the objects to your current JSON response. Note: If you used the Alexa Skills Kit SDK for Node.js, it is currently (as of this article) not supported. This is a known issue, and is on the roadmap to be updated soon. For the code below, when creating the Directives part of the response object, I clearly wrote out all the nested objects so you could work from the bottom up and understand how that was built.

You may notice in the definition for BodyTemplate1, items like backgroundImage and textContent properties have values Image and TextContent, respectively. In order to see what those objects look like, you can scroll down the page to Display Template Elements and view their definitions.

The ATX Dillo Skill has been updated and submitted for certification. Once Amazon has approved it, you can see it work on your Echo Show!

And show’s over!

Depending on the complexity of your Alexa Skill, it could be as easy as my example above, or more time-intensive if you wish to use complex interface templates such as ListTemplate2. Regardless, this should get you started on the path to adding more life to your Alexa Skills on the Echo Show.

Good luck, and let me know in the comments if you run into any issues or have any tips of your own!

thirteen23 is a digital product studio in Austin, Texas. We are an award winning team specializing in human-centered design. Together with our clients, we conceive, design, and develop intelligent software.Ready to build the next big thing? Let’s talk.

Find us on Facebook, Twitter, and Instagram or get in touch at thirteen23.com.