Step-By-Step Tutorial: Adding Voice Controls with the Android Media Session API on Amazon Fire TV

Published in

Amazon Developers

9 min readOct 31, 2018

Amazon Fire TV allows customers to enjoy media content like movies and TV in a lot of different ways. Using your voice, for example, you can use speech to interact with your living environment.

On top of using the voice remote that ships with Amazon Fire TV, starting this year in the US, UK, and Germany, you can connect an Amazon Fire TV and an Echo device and then use your voice to control video playback.

Voice controls in-app are achieved by integrating the Android Media Session API in your application. Follow along with my step-by-step instructions here or watch this video tutorial:

Add the voice control permission to your Android manifest

The first step is to add the voice control permission to your Android manifest. This will allow Amazon Fire TV to identify your application as voice enabled.

<uses-permission android:name="com.amazon.permission.media.session.voicecommandcontrol"/>

There are quite a lot of commands that are available through Media Session and they are all related to media playback. There are controls like play, pause, fast-forward, etc. Here’s a short list:

Understanding the main components of the Media Session API implementation

There are two main components: the Media Session and the Media Player itself. The Media Session provides a set of callbacks which are linked to all the actions that are available to the media player, like play, pause, skip to next, etc. The voice controls directly map into these Media Session callbacks, meaning that every time a customer executes a voice command, the Media Session callback is invoked.

Let’s assume, for example, the customer says “Alexa, pause.” The onPause() callback of the Media Session is then invoked. We’ll then need to invoke the actual Media Player method pause from within the Media Session callback. In a way, Media Session acts as the middleman between the Alexa command and the actual media player in your app.

There are five main steps to implement Media Session correctly in your fire TV application.

Initialize the video player.
Initialize the Media Session.
Configure the actions our application is capable of perform, for example play, pause, next, etc.
Properly manage the Media Session inside the activity life cycle.
Set up the Media Session called bags. That’s where the magic happens, where we connect the voice commands themselves and the actual Media Player inside our app.

Step 1 — Initialize the video player.

onCreate()

@Override
protected void onCreate(Bundle savedInstanceState) { 
super.onCreate(savedInstanceState);    

// Set the video player view    
setContentView(R.layout.activity_media_session);    
mVideo = (VideoView) findViewById(R.id.video_view);
...
 }

The first thing we have to do is in the onCreate() method of our activity, where we set the content to our layout and then we initialize the video player itself. There are a lot of different media players that you can use in your Fire TV application. For the sake of this tutorial, we will use the standard Android VideoView, which provides all the most common callbacks that you would expect in a media player (play, pause, rewind, fast-forward, etc).

onStart()

@Override
protected void onStart() {
    super.onStart();     

//Initialize the media controller for the video player    
    final MediaController mediaController = new MediaController(MediaSessionActivity.this);     
    mediaController.hide();     

//Set the URI of the video     
    mVideo.setMediaController(mediaController);     

//Set the URI of the video     mVideo.setVideoURI(“myvideoURL.mp4”);      
    mVideo.requestFocus();

...

Inside the onStart() method of the application, we initialize the Media Controller. The Media Controller is the component that displays on-screen the current playback status of the media and some basic UI for the controls. We pass the Media Controller to the media player using setMediaController(). We then pass the video URI, which is the address of the media that we want to play. In this case, we have hard-coded the MP4 file; in a real-world scenario, you would probably fetch this address from your back-end service. Finally, we request focus for the video.

Step 2 — Initialize the Media Session

It is good practice to initialize the Media Session once a video player is ready to play content. Most video players provide a listener which advertises that the video player is ready to play content. In this case, VideoView provide us the onPreparedListener class which provides the onPrepared() callback, signaling that a video player is ready to go.

mVideo.setOnPreparedListener(new MediaPlayer.OnPreparedListener() {         

    @Override
    public void onPrepared(MediaPlayer mp) {

//Initialize the the media session
    mMediaSession = new MediaSession(getApplicationContext(), TAG);         

//Assign the Callbacks to the Media Session     
    mMediaSession.setCallback(getMediaSessionCallback());

//Set the flags that allow the app to take over the remote controls      
    mMediaSession.setFlags(MediaSession.FLAG_HANDLES_MEDIA_BUTTONS |
                         MediaSession.FLAG_HANDLES_TRANSPORT_CONTROLS);

    ...

We initialize the new Media Session, which receives the application context as parameter and a string, which is a tag to uniquely identify the Media Session. Then we set the callbacks using the method setCallback(). Don’t worry now about these callbacks, as further down this tutorial we will see how we will define this Media Session callbacks. Finally, we set the Media Session flags, HANDLES_MEDIA_BUTTONS and HANDLES_TRANSPORT_CONTROLS. These flags map the basic remote controls buttons to our Media Session. It is important to note that even though customers might use voice to control the playback, they might also use the physical remote control.

Step 3 — Add the actions to the Media Session

@Overridepublic void onPrepared(MediaPlayer mp) {
...    
PlaybackStateCompat state = new PlaybackStateCompat.Builder()         
        .setActions(PlaybackState.ACTION_PLAY_PAUSE                 
                    | PlaybackState.ACTION_PLAY                 
                    | PlaybackState.ACTION_PAUSE                 
                    | PlaybackState.ACTION_FAST_FORWARD                 
                    | PlaybackState.ACTION_REWIND                 
                    | PlaybackState.ACTION_SKIP_TO_NEXT                 
                    | PlaybackState.ACTION_SKIP_TO_PREVIOUS)
        
        .setState(PlaybackState.STATE_PLAYING, mVideo.getCurrentPosition(), 1.0f)        
        .build();

        mMediaSession.setPlaybackState(state);
        mMediaSession.setActive(true);
}

In the onPrepared() method where we initialized the Media Session, we now create a new variable which is the PlaybackState. This component is used to tell to the underlying Media Session what the current status of the playback is and also defines which actions can be performed by the Media Session.

This is why we use the method setActions() — to set all the actions that we want our application to be able to perform, like play, pause, fast-forward, etc.

We then set the PlaybackState to STATE_PLAYING. We pass the current position (mVideo.getCurrentPosition(), which by default returns 0) and then we pass a float which indicates the speed of the playback, with 1.0f meaning normal speed.

Finally, we pass the PlaybackState to the Media Session using MediaSession.setPlaybackState(state) and activate the Media Session using setActive(true).

Step 4 — Properly manage the Media Session in activity life cycle

When the customer interacts via voice with their Fire TV, an overlay appears on the screen to show the Alexa interaction. At this stage, we need to pause the playback and gracefully manage the behavior of our application.

onPause()

@Override

protected void onPause() {

// Pause the activity
super.onPause();

// Pause the video player

mPlaybackState = PlaybackState.STATE_PAUSED;
mVideo.pause();

//deactivate the media session
    mMediaSession.setActive(false);    
    mMediaSession.setPlaybackState(new PlaybackState.Builder()
            .setState(mPlaybackState, mVideo.getCurrentPosition(),1.0f)
            .setActions(getActions())
            .build());
 }

In the onPause() method of activity, we need to create a new PlaybackState, pause the video player, set the Media Session to not active, and pass the newly created PlaybackState to the Media Session itself.

IMPORTANT: Notice here that we need to set the actions again. If we won’t set the actions again, the newly created PlaybackState would be actions-less, so it wouldn’t be able to map the voice controls back to the Media Session itself.

onResume()

@Override
protected void onResume() {    
super.onResume();                

mMediaSession.setActive(true);        
mPlaybackState = PlaybackState.STATE_PLAYING;        

mMediaSession.setPlaybackState(new PlaybackState.Builder()
                .setState(mPlaybackState, mVideo.getCurrentPosition(),1.0f)
                .setActions(getActions())
                .build());        

mVideo.requestFocus();
mVideo.start();
 }

We need to do the same thing in the onResume() methods. We set the Media Session to active, set the PlaybackState to playing, and then create a new PlaybackState, still setting the actions.

onStop()

@Overrideprotected void onStop() {       

 // Stop the activity    
super.onStop();    

// Stop the video player    
mPlaybackState = PlaybackState.STATE_STOPPED;    
mVideo.stopPlayback();    

//deactivate the media session    
mMediaSession.setActive(false);    
mMediaSession.setPlaybackState(new PlaybackState.Builder()
            .setState(mPlaybackState, mVideo.getCurrentPosition(),1.0f)
            .setActions(getActions())
            .build());
 }

We do the same thing when our application is stopped, using PlaybackState stopped.

onDestroy()

@Override
protected void onDestroy() {     

//We release the media session as we’re destroying the Activity     
if (mMediaSession != null) {              
        mMediaSession.release();         
        mMediaSession = null;
     }
super.onDestroy();
}

The final activity lifecycle event that we need to manage is onDestroy() which is called when the application is terminated. In this case, the only thing we need to do is to release the Media Session, which might be used by other applications.

Step 5 — Set the Media Session callbacks

This is where the magic happens and where we actually connect the voice commands and the actual behavior in our Media Player.

private MediaSession.Callback getMediaSessionCallback() {

return new MediaSession.Callback() {
     @Override
    public void onPlay() { } 
    
    @Override
    public void onPause() { }
    
    @Override
    public void onSeekTo(long pos) { }

    @Override
    public void onFastForward() { }

    @Override
    public void onSkipToNext() { }


    ...

The Media Session callbacks are multiple and mimic the commands available in the Media Player. These are the commands that we will see: play, pause, seekTo, fast-forward, and skip to next.

onPlay()

private MediaSession.Callback getMediaSessionCallback() {

return new MediaSession.Callback() {
         @Override
         public void onPlay() {                  
         
         mPlaybackState = PlaybackState.STATE_PLAYING;
         mVideo.start(); 
         
         updatePlaybackState(mPlaybackState, mVideo.getCurrentPosition(),1.0f );
         mMediaController.hide();

         }
    ...

The callback onPlay() is pretty easy: we set the PlaybackState to STATE_PLAYING, we start the video player, we just update the PlaybackState, and possibly hide the Media Controller.

onPause()

private MediaSession.Callback getMediaSessionCallback() {

return new MediaSession.Callback() {
   ...
   @Override
   public void onPause() {                  
   
    mPlaybackState = PlaybackState.STATE_PAUSED;
    mVideo.pause();
    
    updatePlaybackState(mPlaybackState, mVideo.getCurrentPosition(),1.0f );          
    mMediaController.show();  
    
    }
    ...

onPause() is similar to onPlay(): set the play back to STATE_PAUSED, pause the media player, and update the PlaybackState accordingly. In this case, we will show the MediaController as we want to display the UI to the customer, showcasing at what stage the playback is.

onSeekTo(long pos)

private MediaSession.Callback getMediaSessionCallback() {
return new MediaSession.Callback() {
    ...
 @Override
 public void onSeekTo(long pos) { 
 
    mPlaybackState = PlaybackState.STATE_BUFFERING;
    
    mVideo.seekTo(mVideo.getCurrentPosition() + (int) pos);
    
    }
    ...

The callback onSeekTo() is interesting and this is where you can see how powerful the implementation of the voice controls is. This method allows us to fast-forward the content by a certain amount of seconds, minutes, or even hours.

This method is invoked when the customer says “Alexa, forward…” and adds a certain amount of minutes (for example, “Alexa, fast-forward one minute and thirty seconds”). In this case, the Alexa voice controls automatically convert what the customer is saying into milliseconds, which is passed to the seekTo() position as the variable long “pos.”

The great advantage here is we don’t need to do anything to correctly parse what the customer is saying because that is managed automatically by Alexa.

Fast-forward()

private MediaSession.Callback getMediaSessionCallback() {

return new MediaSession.Callback() {

    ...
 @Override
 public void onFastForward() {                  
 
    mPlaybackState = PlaybackState.STATE_BUFFERING;
    
    mVideo.seekTo(mVideo.getCurrentPosition() + 10 * 1000);   
    
    }
    ...

However, the customer might also just say “Alexa, fast-forward” without specifying an amount of seconds, minutes, or hours. In this case, we need to implement the fast-forward method just passing a predefined amount of milliseconds. In this example, we are passing 10,000 milliseconds, aka 10 seconds.

onSkipToNext()

private MediaSession.Callback getMediaSessionCallback() {

return new MediaSession.Callback() {
    ...
 @Override
 public void onSkipToNext() {                  
    mPlaybackState = PlaybackState.STATE_SKIPPING_TO_NEXT;
    
    //Skip to the next video in the playlist
    skipToNextVideo();       
    
    }
    ...

The last callback we see is onSkipToNext(). This method really depends on how you have implemented your application and how you fetch the next videos. You would likely create a skiptoNextVideo(), method which would fetch all the information to skip to the next video. Just remember to set the state to STATE_SKIPPING_TO_NEXT as loading a new video might take a while and you want to show some buffering in your video player.

Building apps in minutes

We want to enable you to build high-quality applications for Amazon Fire TV in just a few minutes. In order to do this, you can leverage the Amazon Fire App Builder.

Amazon Fire App Builder is a plug-and-play template for audio and video applications, allowing you to create an app in less than an hour. It contains modules to enable advanced functionalities. It handles JSON feeds, custom branding and customization, fully supports the Amazon Fire TV family, and it fully integrates the Media Session API and voice controls that we seen in this tutorial. This work out of the box, so you don’t have to do anything to enable it when using Fire App Builder. Fire App Builder provides modules for features like in-app purchasing, subscriptions, social logins, advertisement, analytics, and custom media player. You can swap these modules in and out of your application depending on what you need.

To find out more about Fire App Builder visit https://github.com/amzn/fire-app-builder.

If you’re interested in finding out more about Fire TV development, download our free eBook, “How to Develop Media Streaming Apps for Amazon Fire TV”.

Thanks for reading!

Mario Viviani — @mariuxtheone

Originally published at developer.amazon.com.

Step-By-Step Tutorial: Adding Voice Controls with the Android Media Session API on Amazon Fire TV

Add the voice control permission to your Android manifest

Understanding the main components of the Media Session API implementation

Building apps in minutes

Written by Mario Viviani