Alexa Presentation Language for Audio

Or APLA for short — creating more immersive audio experiences.

Published in

Alexa Skills Dev

4 min readSep 28, 2020

One of the frustrations I’ve often found with Alexa skills was the ability to mix and match audio files effectively. You were limited to 5 audio files but in theory you couldn’t have Alexa talking over them. This meant you couldn’t do some really cool things like having sound effects play behind the speech.

Recently however, Amazon have announced a new APL offering, APLA. This is an audio version of Alexa Presentation language and allows you to mix speech with audio files and add delays etc. A much needed boost to help create more immersive experiences! The best part about it is that it’s fully backward compatible even with Echo Dots so you don’t need to check for this capability.

In this article I’m going to explain some of the basics about how to use APLA to enhance your current skills.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

APLA Documents

APLA documents are structured like other APL documents, using a JSON format to specify what and how you want audio to be played. An example can be found below.

The example above uses a Mixer element as it’s root item. A Mixer element allows you to “mix” your audio clips, or as the description above says, play multiple audio files at the same time! The mixer element contains two items. The first of which is a “Sequencer” which also contains multiple elements. The difference between a “Sequencer” and a “Mixer” however is that the “Sequencer” plays items in a sequence. i.e. one after the other.

In our case here the elements that make up the Sequencer’s items are “Silence” and “Speech”. “Silence” is as you’ve probably guessed a pause (the equivalent of a “break” in SSML. You simply provide it with a “duration” element. Our second element is of type “Speech” which is text that you want Alexa to read out. You can think of this as your response.text and in our case we’ve supplied a contentType property of SSML.

Finally back in our “Mixer” element, the final item we have is of type “Audio”. This simply plays a source file that is provided. In our case that happens to be one of the Alexa Skills Kit sound library files.

The above APLA document would start playing our sound file alongside a 1 second pause. Once that pause has finished it would output the text in our “Speech” element, all the while the sound file would still be playing (provided it’s longer than the one second pause of course!).

Some things to note are APLA documents can still only have a combined audio length of 240 seconds and it works with outputSpeech directives and reprompts. So if you have those defined it will play after the outputSpeech. So you may want to omit the outputSpeech property if your response includes APLA. Also when using the authoring tool and referring to type “Link” make sure you build your interaction model before testing. You can of course provide the full JSON rather than using a Link type.

You can create your APLA documents in the APL authoring tool. You can find this under the “MultiModal Responses” section of the Alexa developer console. APL documents are saved against a single skill ID. Once you’ve given them a name you can then reference them in your code. To add an APLA document to a response you simply do the following:

This also shows you how you can bind datasources to a document allowing you to pass in variables/entities etc.

There’s more you can do with APLA documents, including adding filters for fade-in/fade-out and adjusting the volume of audio clips and I suggest you check out the docs for more info https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apla-document.html

One current limitation with the designer is the inability to see how long the clip is. As such you can follow the instructions below to add an “overrides” in Google Chrome Console to enable you to see this information:

There’s probably a better/easier way of doing this, but it might be of use to someone. For my project I wanted to know how long my APLA file was especially when using generated speech and wanting to match sound effects etc. to it. On the APLA designer page, open up chrome developer tools and select the “network” tab. Reload the page if needed and find a file that looks like “main.e2c051b1a7bfff50c5e1.js”. Right click this file and select “save for overrides”. I would suggest finding where your local overrides is saved on your computer and editing the file there, rather than trying to do it in the console as it will be extremely slow.

You can use my gist here to override your local file which also adds a div that displays the time on every play. Also just be aware that if Amazon change the code on this page this may break it and/or stop working itself so you will need to disable or delete your overrides.

You may also need to refresh the page once you’ve opened the devtools to get this to work for some reason.

Finally I’d like to apologise for the lack of updates recently. I’ve got some more articles planned and just need to find some time to write them. Please feel free to join my facebook group at https://www.facebook.com/groups/alexaskillsdevelopers and let me know if there’s anything else you’d like me to cover.

Alexa Presentation Language for Audio

Or APLA for short — creating more immersive audio experiences.

APLA Documents

Written by Tom Berwick