Media Projection and Audio Capture
Starting from Android Lollipop, developers have an API that can be used to capture parts or the entire visualization of a device’s screen: MediaProjection. James O’Brien gave a great description of that usage in this article.
From Android 10, the MediaProjection API was extended to support the audio capture use-case. That is especially interesting if your app does any sort of streaming or Twitch-like broadcasting. In some cases, though, you might want to have finer control over when or what can be captured, either for user-privacy reasons, or content-protection (i.e copyright).
At SoundCloud, we cared for both use-cases; and as we worked on preparing our app for targeting API Level 29, making sure only the expected actors could capture our audio was critical. We want to be able to support use-cases like Live-Caption, but in turn, also protect our copyrighted content to prevent leaks or piracy.
To do so, simply following the documentation wasn’t enough; we also wanted to verify the solution actually worked. For that reason, we have built a demo application that uses the audio capturing API to interact with our app, just like a third-party app would do. Here’s how we’ve done it.
Before we get started
Given we will be recording what the user’s device is playing, we are first required to request a few permissions. Make sure to prompt the user at the appropriate time for the
Also, since the audio capturing operation will be long-standing, we will need a foreground service to keep the user informed of the execution. For that reason, don’t forget to declare the
FOREGROUND_SERVICE permission on your AndroidManifest file, too.
For privacy reasons, audio/video capturing is special on Android in comparison to other permission requests, in the sense that a capturing app must prompt the user for explicit approval every time a
MediaProjection is needed.
A request to the
MediaProjectionManager system service can be done with a single-liner, but must happen on the context of the UI so that a confirmation dialog can be displayed.
From Android 10 and later, a
Service must be running and call
startForeground to post a
Notification before we can obtain the
MediaProjection instance; failing to do so will cause a
SecurityException. The actual audio capturing operation does not need to be done in the
Service code, but you absolutely need a
Service, even if its sole purpose is to manage the lifecycle of the
Notification. Don’t forget to declare the
foregroundServiceType on your AndroidManifest declaration:
Now that we have all the pieces in place, we can obtain
MediaProjection instance to perform the audio capturing.
Here is when things get interesting; the audio capture configuration API is very flexible and provides many hooks for its definition, so we can optimize our specific media use-cases. We will pass both an
AudioPlaybackCaptureConfiguration and an
AudioFormat object to the
AudioRecord instance we will use to fill our audio data buffers.
The first object will define which type of media we will capture (
USAGE_GAME), and we can optionally define inclusion/exclusion app UIDs to filter which apps we are (or aren’t, respectively) interested in capturing.
AudioFormat defines how the audio data will be encoded. We can set the capture sample rate in Hz, the number of channels for capture, and which encoding to be used; from raw PCM samples with values varying from 8, 16 or floating point precision, to even compressed samples of different types, varying from MP3, AAC, AC3 and more, based on the devices encoding capabilities.
Notice how important these parameters are depending on the use-case: if you plan to upload the capture audio to the cloud or store them to disk, you might prefer encoded samples for their lower sample/byte ratio, but on the other hand if your use-case is of demuxing or post-processing you might be interested in more precise renditions in PCM.
For our example, validating our first-party app being captured by a third-party, it is enough to define the simplest combination of static properties: mono PCM-16 raw audio. For a more complex use-case, one could make them dynamic based on the target captured material properties. Fortunately, the documentation for
AudioFormat is quite extensive and describes well all available usage options.
AudioRecord object, we can call
startRecording to initiate the flushing of audio samples into our predefined buffer, which can be an in-memory or file
Notice this is a performance-critical operation: interruptions on the read thread will cause audio glitches and crackling (read more about low-latency audio rendering on my other article). On top of that, we also don’t want to block the UI thread with our recording execution, so it is advisable to run the recording in a
Thread of its own.
In our example, we will convert the
PCM-16 integer samples into a
ByteArray to be written to disk, so we must keep note of the endianness to be able to properly perform the samples’ playback later on.
Once we’re done with the capture, we
AudioRecord, release all heavy resources and stop our foreground service.
Once with the data…
Performing playback of the captured PCM data on Android is possible, even though it isn’t done with the friendlier or more commonly used APIs of
MediaPlayer, but by using
For that reason, and for visualization purposes, we suggest pulling out the captured data for processing with a desktop-app solution such as the free Audacity audio editor. There, we have fine-tuned control to import the raw data and specify all of the parameters we have previously defined for encoding, sample rate, bit precision and even byte endianness.
By using the demo app, we could then verify that capturing of audio samples only happened for the media we explicitly specified, and all of our content was protected according to our business requirements.
For more details of how the API is designed and insights to our experiment at SoundCloud, check out the public GitHub repository for the sample app we’ve built. You can use it to verify your app reacts as you expect to audio capturing. There, you will also find a full implementation, deeper clarification and code-comments for further adaptation of the code to apply it to your recording use-case.