Playing with Surface Capabilities in your Actions
One of the most important points when doing our Actions is to clearly define your conversational flow. What paths to take if the device with Google Assistant has or doesn’t have a screen?, has a web browser or not?, has sound at least?. All this is important to map to define the flow correctly and give the best user experience. It’s at this point that we enter the concept of Surface Capabilities.
You can define the features your application will support. Whether the use of a screen, sound, or web browser will be restrictive. All this was defined from your Action on Google panel. You can access it from your project panel, clicking on Surface in the left panel, and then on the right screen choose what conditions your application will support.
For example, in the image, I have put that none of the above features are necessary or restrictive, so my application will be available on any device with Google Assistant.
But for example, if I defined that a screen is necessary to display information, then my application would no longer be available for any version of Google Home. And don’t worry, the console will always be warning you.
All right, but how would you handle this on runtime? 🤔
We can also manage the Surface Capabilities from our code/platform and treat them in runtime.
Runtime Surface Capabilities
When we work on our actions and we consider the runtime behavior, we found different kids of situations. For example, always display information on visually appealing cards, or using multimedia controls or even having the same information structure regardless of how it’s presented to us.
The documentation tells us about the 3 most common cases that can occur when we work on our actions:
- Response branching
- Conversation branching
- Multi-surface conversations
1 - Response Branching
Imagine you want to make an application to tell you the time, weather or even the current dollar value. If you have a Google Home and you ask him about the current value of the dollar, He can tell you audibly without problems. And if you do it from a phone, you could attach a picture to your answer that complements the answer.
In other words, in this kind of case, neither the flow nor the structure of the information must changes by the type of surface.
The Actions library provides us with constants that help us identify the type of surface we have available. We can access them with the help of the conversational interface “conv”, and the use of the “surface” class.
Let’s make a small example and define an attempt, where we ask the current value of the dollar.
As you can see on line five, I’m checking to see if my current device where the action was invoked, is supporting some kind of screen, through the conv.surface.capabilities.has function, where the parameter to perform this verification is: actions.capability.SCREEN_OUTPUT
So now, the situation is simpler than it seemed at first, now we simply have a situation of truth or falsehood. If it’s true, we put the desired text plus any image (line 8) that we need to enrich the answer otherwise, if we don’t have a screen, we simply proceed to speak the answer. It’s important noting that in either case, the flow of the app as well as its response have not changed, the only thing that changed is how the information was shown.
2 - Conversation Branching
Well let’s face it, there are certain application flows that can be a bit long and tedious with only voice commands, while on a screen that receives all the user inputs, the process is shorter and faster.
Imagine for example that you need to give to the user a set of options to choose, each one with different descriptions, then you have two options, read them one by one next to their descriptions 😐, and in case the user doesn’t understand well, you should have to repeat all the options again 😱 😨 or, you can show everything in a carousel of cards with all the necessary information and navigate between them until you choose the desired option with peace of mind.
Then, it means that we have to divide certain flows of our application according to the type of Surface that the assistant has available at the moment.
So, how do I indicate what “Intent” should react according to the type of Surface? 🤔
With Contexts!!!. Just as our “Intents” may receive contexts defined by ourselves or some other particular “Intents”, the Assistant also launches its own contexts according to the type of device where it’s running, so that it can be easily recognized. We have the following types of contexts defined with respect to the Surface that can be launched:
actions_capability_audio_output- The device has a speaker.
actions_capability_screen_output- The device has an output display screen.
actions_capability_media_response_audio- The device supports playback of media content.
actions_capability_web_browser- The device supports a web browser.
We just have to choose under what context we want our intent to be trigger and place it as an input context in our Intent in the Dialogflow console.
For example, here a Intent that will only trigger on surfaces with screens using the actions_capability_screen_output context:
3 - Multi-surface Conversations
Now we’ll get into the case that we’ve started a conversation from Google Home where we’re looking for information about something in particular, the conversation has gone very well, but in the end, you need to see the resulting information on some screen. What are you doing? We move the conversation somewhere else. Literally 😆.
To move the conversation to another site, we first need to verify that it is available. In this case, we want to move from a situation where we don’t have a screen to show data, to a place where we do.
Let’s verify that you have a screen available to work, otherwise you will not be able to continue.
Using the function conv.available.surfaces.capabilities.has() we make sure we know if there is a screen available on the current user where to send the data we need, such as their Android or iOS phone. The only function that asks is what type of surface we are looking for, in our case SCREEN_OUTPUT.
If the above function returns true, it means that if we have a place to send our data but, how do we do it?, using the NewSurface() class.
NewSurface() — Requests the user to switch to another surface during the conversation. (Action on Google Documentation)
For create a NewSurface request, we need 3 parameters:
- Capabilities: What kind of capabilities we look at for. In this case, only SCREEN_OUTPUT
- Context: The Assitant will ask permission (obviously) to send you data to your phone. Context gives a general idea of what will be sent when the assistant asks you if it has your permission to send the information to a new screen.
- Notification: This is the title of how the notification will appear on our phone.
You can start trying this, either from the emulator or from your own Google Home, but it has to be an Assistant without a screen in order to test the features of move the conversation to another device.
As you can see, I’m using the emulator but with the Surface setup in Google Home to simulate the startup of our app in a assistant without screen.
Please note that both your Simulator or the Google Home that you’re using, and the device where you want to send the notification, must both be configured with the same Google Account in order to work together.
If you gave permission in the conversation to send the information to your phone, a notification should have arrived on your device, in my case I’m using my Android Device.
Remember that the title of your notification was defined earlier in your code.
And we launch our new screen requirement and that’s it all? Well, no. We need to catch this event in the Dialogflow console through an Intent with the
actions_intent_NEW_SURFACE event, because when the user clicks on the notification, you must manipulate what should happen next.
Then we can handle it through our webhook when it is invoked.
First I verify that the status of the new request is fine (line 2), and then I paint the result as I would like, for this example I used a BrowseCarousel. This time I am certain that I have a screen where I can show a more enriched answer.
And that’s all, now we have a conversation that started in a Google Home but continued on our phones.
- Before developing your application, it’s always good to map the entire flow of your application. This will make it easier to understand in what situations you need a screen or just sound in your application.
- Remember that you can limit Surface Capabilities from your console or at runtime with the help of the SDK in your webhook.
- Always try not to interrupt the conversation to the user, give them alternate flows when you don’t have a Surface available, but always in the end, give the user what they came to look for in you.