Media Projection and Audio Capture

Júlio Zynger
5 min readOct 30, 2019

--

Photo by June Wong on Unsplash

Starting from Android Lollipop, developers have an API that can be used to capture parts or the entire visualization of a device’s screen: MediaProjection. James O’Brien gave a great description of that usage in this article.

From Android 10, the MediaProjection API was extended to support the audio capture use-case. That is especially interesting if your app does any sort of streaming or Twitch-like broadcasting. In some cases, though, you might want to have finer control over when or what can be captured, either for user-privacy reasons, or content-protection (i.e copyright).

At SoundCloud, we cared for both use-cases; and as we worked on preparing our app for targeting API Level 29, making sure only the expected actors could capture our audio was critical. We want to be able to support use-cases like Live-Caption, but in turn, also protect our copyrighted content to prevent leaks or piracy.

To do so, simply following the documentation wasn’t enough; we also wanted to verify the solution actually worked. For that reason, we have built a demo application that uses the audio capturing API to interact with our app, just like a third-party app would do. Here’s how we’ve done it.

Before we get started

Given we will be recording what the user’s device is playing, we are first required to request a few permissions. Make sure to prompt the user at the appropriate time for the RECORD_AUDIO permission.

Also, since the audio capturing operation will be long-standing, we will need a foreground service to keep the user informed of the execution. For that reason, don’t forget to declare the FOREGROUND_SERVICE permission on your AndroidManifest file, too.

Audio Capture

For privacy reasons, audio/video capturing is special on Android in comparison to other permission requests, in the sense that a capturing app must prompt the user for explicit approval every time a MediaProjection is needed.

A request to the MediaProjectionManager system service can be done with a single-liner, but must happen on the context of the UI so that a confirmation dialog can be displayed.

From Android 10 and later, aService must be running and call startForeground to post a Notification before we can obtain the MediaProjection instance; failing to do so will cause a SecurityException. The actual audio capturing operation does not need to be done in the Service code, but you absolutely need a Service, even if its sole purpose is to manage the lifecycle of the Notification. Don’t forget to declare the foregroundServiceType on your AndroidManifest declaration:

Now that we have all the pieces in place, we can obtain MediaProjection instance to perform the audio capturing.

Audio Capture

Here is when things get interesting; the audio capture configuration API is very flexible and provides many hooks for its definition, so we can optimize our specific media use-cases. We will pass both an AudioPlaybackCaptureConfiguration and an AudioFormat object to the AudioRecord instance we will use to fill our audio data buffers.

The first object will define which type of media we will capture (USAGE_MEDIA, USAGE_GAME), and we can optionally define inclusion/exclusion app UIDs to filter which apps we are (or aren’t, respectively) interested in capturing.

The AudioFormat defines how the audio data will be encoded. We can set the capture sample rate in Hz, the number of channels for capture, and which encoding to be used; from raw PCM samples with values varying from 8, 16 or floating point precision, to even compressed samples of different types, varying from MP3, AAC, AC3 and more, based on the devices encoding capabilities.

Notice how important these parameters are depending on the use-case: if you plan to upload the capture audio to the cloud or store them to disk, you might prefer encoded samples for their lower sample/byte ratio, but on the other hand if your use-case is of demuxing or post-processing you might be interested in more precise renditions in PCM.

For our example, validating our first-party app being captured by a third-party, it is enough to define the simplest combination of static properties: mono PCM-16 raw audio. For a more complex use-case, one could make them dynamic based on the target captured material properties. Fortunately, the documentation for AudioFormat is quite extensive and describes well all available usage options.

Specifying a mono PCM-16 rendition with sample rate of 8kHz

Holding the AudioRecord object, we can call startRecording to initiate the flushing of audio samples into our predefined buffer, which can be an in-memory or file OutputStream.

Notice this is a performance-critical operation: interruptions on the read thread will cause audio glitches and crackling (read more about low-latency audio rendering on my other article). On top of that, we also don’t want to block the UI thread with our recording execution, so it is advisable to run the recording in a Thread of its own.

In our example, we will convert the PCM-16 integer samples into a ByteArray to be written to disk, so we must keep note of the endianness to be able to properly perform the samples’ playback later on.

Once we’re done with the capture, we stop the AudioRecord, release all heavy resources and stop our foreground service.

Once with the data…

Performing playback of the captured PCM data on Android is possible, even though it isn’t done with the friendlier or more commonly used APIs of MediaPlayer, but by using AudioTrack.

For that reason, and for visualization purposes, we suggest pulling out the captured data for processing with a desktop-app solution such as the free Audacity audio editor. There, we have fine-tuned control to import the raw data and specify all of the parameters we have previously defined for encoding, sample rate, bit precision and even byte endianness.

By using the demo app, we could then verify that capturing of audio samples only happened for the media we explicitly specified, and all of our content was protected according to our business requirements.

Audio capture is allowed for certain use-cases, but silenced in others.

For more details of how the API is designed and insights to our experiment at SoundCloud, check out the public GitHub repository for the sample app we’ve built. You can use it to verify your app reacts as you expect to audio capturing. There, you will also find a full implementation, deeper clarification and code-comments for further adaptation of the code to apply it to your recording use-case.

Thanks to Riccardo and Danny for the proofreads.

--

--