Improving ExoPlayer rendering performance

Christos Tsilopoulos
AndroidX Media3
Published in
5 min readFeb 21, 2020

We have recently pushed a new experimental feature to the ExoPlayer dev-v2 branch as part of an overall effort to improve the player’s rendering performance, i.e., reduce dropped frames and audio underruns. The feature is changing ExoPlayer in two ways:

  1. The player operates MediaCodec in asynchronous mode.
  2. The player submits input buffers to the MediaCodec on a separate thread.

We are still evaluating the feature internally but we are opening this to the community at a very early stage. We are welcoming anyone who is interested in experimenting with the new feature to do so, and to provide us with feedback.

Please read below how to enable the new feature in your application, and to learn more about the implementation. Make sure to read the known issues at the bottom of this post before trying this feature.

API usage

We have added the method experimental_setMediaCodecOperationMode() to the DefaultRenderersFactory, which you can pass to either SimpleExoPlayer.Builder or ExoPlayer.Builder.

If you are using a custom RenderersFactory or pass a Renderer[] when constructing an ExoPlayer instance, you can call experimental_setMediaCodecOperationMode() directly on any MediaCodecVideoRenderer and MediaCodecAudioRenderer instances your application is creating. Make sure you set the same operation mode on all MediaCodecRenderer instances you are using — our internal experimentation suggests that mixing modes does not work well.

The supported operation modes are:

  • OPERATION_MODE_SYNCHRONOUS: Maintains the current behavior. This is the default value.
  • OPERATION_MODE_ASYNCHRONOUS_PLAYBACK_THREAD: In this mode, ExoPlayer operates the MediaCodec in asynchronous mode and MediaCodec callbacks are routed to the player’s playback thread. This mode is applicable when API level ≥ 21. If API level < 21, the operation will fall back to OPERATION_MODE_SYNCHRONOUS.
  • OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD: In this mode, ExoPlayer operates the MediaCodec in asynchronous mode and callbacks are routed to a separate thread, one for each MediaCodec instance. This mode is applicable when API level ≥ 23. If API level < 23, the operation will fall back to OPERATION_MODE_SYNCHRONOUS.
  • OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK: Same as OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD, but the internal implementation is slightly optimized by performing more fine-grained locking on the internal data structures for input and output buffers.
  • OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_ASYNCHRONOUS_QUEUEING: The mode extends OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD and offloads queueing input buffers to another thread.
  • OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK_ASYNCHRONOUS_QUEUEING: The mode extends OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK and offloads queueing input buffers to another thread.

You can experiment with the feature in the demo app by setting the MediaCodecOperationMode on the app’s Renderer factory.

We discuss each mode’s implementation details and compare their differences below.

Background

By default, ExoPlayer is using the platform’s decoders to decode video and audio via the Android MediaCodec API. The MediaCodec is processing data asynchronously in its own internal threads. Clients interact with the MediaCodec by dequeueing and enqueueing input and output buffers. While the actual data processing is happening on a separate thread, the MediaCodec exposes a synchronous and an asynchronous API to clients for dequeueing and enqueueing buffers.

ExoPlayer has an internal playback thread that is periodically checking the MediaCodec for available input or output buffers using MediaCodec’s synchronous APIs. On each iteration of the playback thread, ExoPlayer’s MediaCodecRenderers are trying (in this order) to dequeue as many output buffers as possible for output and then enqueue as many as possible samples to the MediaCodec for decoding. To maintain smooth playback, the loop needs to iterate frequently enough and handle buffers at a rate that on average matches the content’s rate.

We have seen that some of MediaCodec’s methods can take a few milliseconds to complete on specific devices when playing certain types of content. That causes playback issues because video frames or audio samples are processed too late, especially when playing high frame rate or high resolution video.

The experimental feature changes the player in two ways:

  1. ExoPlayer is operating MediaCodec in asynchronous mode and obtains available input and output buffers via MediaCodec callbacks. On specific test devices, we have seen that this change alone improved the MediaCodec’s performance (e.g., we observed that MediaCodec.queueSecureInputBuffer() was taking on average less time to complete).
  2. The player submits input buffers to MediaCodec (i.e., calls to MediaCodec’s queueInputBuffer() and queueSecureInputBuffer()) on a separate thread in order to keep the playback thread unblocked.

Implementation details

When you set the MediaCodecOperationMode to OPERATION_MODE_ASYNCHRONOUS_PLAYBACK_THREAD, ExoPlayer will set a callback to the MediaCodec routed in the player’s playback thread. The playback thread stores input/output buffer references between iterations of its main loop, for handling on the next iteration. As a consequence, when the player is dequeueing input and output buffers, it will obtain the buffers that have been received up to that point. We have seen this mode to offer some performance improvements because MediaCodec was operating slightly faster in asynchronous mode on some specific devices.

With OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD, ExoPlayer will handle MediaCodec’s callbacks in a separate thread, one per MediaCodec instance. Unless you are using extensions, ExoPlayer is usually using two MediaCodec instances (one for audio and one for video), therefore this mode will create two additional threads. In contrast to OPERATION_MODE_ASYNCHRONOUS_PLAYBACK_THREAD, when the player is dequeueing input and output buffers from the playback thread, MediaCodec callbacks are still executed in parallel. Our initial experimentation shows that this mode further improves rendering performance compared to OPERATION_MODE_ASYNCHRONOUS_PLAYBACK_THREAD. However, we are still evaluating this mode’s effectiveness on devices that have a limited number of available CPU cores, as well as the overall impact on battery usage.

The OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK mode is functionally equivalent to OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD with the difference that it uses a finer-grained locking mechanism. Though locked/synchronized operations are lightweight anyway, we want to evaluate whether granular locking delivers additional performance improvements that could justify the slightly more complicated code.

The OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_ASYNCHRONOUS_QUEUEING extends OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD by submitting input buffers on a separate thread, one per MediaCodec instance. As discussed, unless you are using extensions, the player creates two MediaCodec instances, therefore this mode will create two additional threads. In total, four additional threads will be operating in this mode compared to the default OPERATION_MODE_SYNCHRONOUS. We have seen this mode boost rendering performance on high frame-rate content.

Last, the OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK_ASYNCHRONOUS_QUEUEING is the extension of OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK that submits input buffers to a separate thread as described above.

Video frame processing offset

We have added a new metric to evaluate the rendering performance called Video Frame Processing Offset, or vfpo in short. vfpo measures how early the player processes a video frame compared to its presentation time, in microseconds. For example, if a video frame is processed by the player 30ms before the frame should be displayed on screen, the vfpo of this frame is 30000. Similarly, if a video frame is processed too late by 10ms (the player’s current position has progressed beyond the frame’s presentation time), the vfpo for this frame is -10000 (and the video frame is dropped).

In the demo app, the video debug information displayed at the top includes the average vfpo during playback (see Fig. 1 below). We expect smooth playback without dropped frames or audio underruns when the average vfpo is above 40000.

Figure 1. vfpo is highlighted.

Known issues

As mentioned already, the feature is experimental and we are internally evaluating its effectiveness and exploring possible side-effects. We know of at least two devices (Samsung S8 and S9) that may produce garbled video when asynchronous buffer queueing is enabled (modes OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_ASYNCHRONOUS_QUEUEING and OPERATION_MODE_ASYNCHRONOUS_DEDICATED_THREAD_MULTI_LOCK_ASYNCHRONOUS_QUEUEING) and the player is playing Widevine-encrypted content. We have not implemented a workaround at the time of writing this post.

--

--