Your app and low-latency audio output

7 min readJun 18, 2018

When building any software, we want to provide the best possible experience to our users, giving them the feeling of being in full control and taking the most value out of our applications. With audio-related software, that is not different: a very important metric for tracking playback performance is latency: minimizing the time it takes from pressing the ‘play’ button and listening to music is key for providing a seamless impression.

Today we will visit some concepts on digital audio, learn about how part of it is done on Android devices, and understand how buffers play a key-role in providing the smoothest playback experience to users as possible.

Sound travels in waves…

…and a very common method to represent analog signals (such as waves!) is pulse-code modulation (PCM). Basically, the wave amplitude is measured at a regular time interval. Each one of these values is called a sample. A lot of these samples are necessary to represent actual sound — for example, more than 44 thousand measurements are made every second!

Stereo sound is nothing more than two waves being transmitted at the same time (nowadays, surround systems provide even more than two channels!). If we want to play stereo sound, then, we’ll need to sample the two channels and digitally store them. In PCM terms, a frame is a set of one sample per channel. So, for stereo sound, a PCM frame will contain data representing two samples.

Every modern phone out there has, as part of its circuits, a component capable of converting digital data (0s and 1s) to analog data (in this case, variable voltage, that will be passed to speakers). These are called digital-to-analog-converters (DACs).

To process data, they require information to be fed at a constant rate, and also in blocks of a specific size. The DAC knows where it should look for data — that is a region of memory the operating system specifies and in which we (application developers) should write to, with a very fancy name: the buffer.

The DAC will reach out to the buffer at a fixed time interval to get one of these data blocks (or chunks), but if we fail to write to the buffer when the DAC visits it, the hardware will output silence. That is a very bad experience for users and the reason music stops playing or they listen to crackles/noise (a.k.a. popcorn), so we should avoid it at all costs, by constantly writing to the buffer as soon as possible.

This is where latency comes in — the time we take to feed the DAC (or, in other words, to fill the buffer). In Android, audio tasks are among the highest priority ones in the entire system: a completion of one of these tasks will even interrupt other unrelated tasks in the OS level, so that the processing is as fast as it can be.

A special characteristic of Android devices…

… is that they can come in many shapes and sizes, but also hardware architectures, and manufacturers ship different DACs within their boards. To be able to get the most (or the least!) out of the latencies, we can also ‘ask’ the OS (Android 4.1 and above) what is the most suitable configuration for their buffer so we can work the best with the embedded DAC.

By metrifying SoundCloud’s app, we were able to collect some interesting statistics about how variable were the hardware configurations for this considerably big userbase (taken in mid-2018):

+------------------+-------+ 
| Sample Rate (Hz) | Ratio |
+------------------+-------+
|     48_000       | 87%   |
|     44_100       | 12%   |
|     8_000        | < 1%  |
|     96_000       | < 1%  |
|     88_200       | < 1%  |
|     32_000       | < 1%  |
+------------------+-------++-----------------------+-------+
|       Buffer size     | Ratio |
|        (#frames)      |       |
+-----------------------+-------+
|         192           | 30%   |
|         240           | 27%   |
|         960           | 22%   |
|         1024          | 9%    |
|         256           | 3%    |
|         480           | 2%    |
|        2048           | 2%    |
|          96           | 1%    |
|         512           | 1%    |
|        1920           | 1%    |
+-----------------------+-------+

Sample rate

One of the things we can do is to configure our application to fill the buffer with the optimal number of samples per second — the sample rate. Usual examples of sample rates are 44.1kHZ (i.e: 44.100 samples per second) or 48kHZ, but different DACs might have different optimal values.

If you don’t use the DAC’s expected sample rate, the OS will have to resample your data to match that, and then the audio processing won’t go through the fast path (taking precious milliseconds from our low latency goal).

We can use one AudioManager API to query that value from the system and then pass it to the player in our application:

Buffer size

If we use a buffer that is too small and the CPU has a lot of tasks to perform, there could be not enough time to come back to our thread and fill the next chunk of the buffer, so we’d get popcorn, because the DAC would have to wait for the next chunk to be delivered.

On the other hand, if we use a buffer that is too large, that means we need to delay the output of audio for the time duration of that buffer while we fill it completely before we deliver it to the DAC — and there is latency again. For example, if we have a buffer that can hold 4096 samples and our sample rate is 44.1kHZ, that means we would have a ~93 ms delay from when the data comes in the processing pipe and when the DAC is able to consume it.

Yet again, we can ask the OS what is the best buffer size we should be using, to find what the optimal value is:

We will want then, to use a multiple of that value to determine how many samples we should put in each of the buffer’s chunks, so that we can minimize the number of times the DAC needs to call us back for more data.

For example, if the device reports a recommended buffer size of 208 samples and we make our chunks hold a non-multiple value of samples, like 160, we could have the following scenario:

Let’s say the first 160-samples chunk was consumed, and the DAC calls us before the buffer-filling thread is able to add more data. Since there is not enough data in the buffer to collect another chunk, the DAC will have to call us yet again later. In the meantime, the OS scheduler might interrupt the DAC thread to give CPU access to another task. If that’s the case, we run the risk of surpassing the time deadline to output audio, causing a glitch.

Interruption of DAC’s consumer thread is more likely than we would expect, given that the platform uses locks around the buffer data structure, preventing concurrent reads and writes. So even if the DAC gets scheduled at the right time, but the filler thread is interrupted, it will have to wait for the lock to be released — surpassing the time deadline (i.e: glitch). It is something Google is aware, though: they recently released a new audio API for usage with Android Oreo and above that claims to improve that scenario.

In summary,

there are many factors that influence latency and the overall feeling we give users when doing audio playback, and some of these are even out of our control.

For the aspects we do control, it is beneficial to have a broad perspective of the elements in play, and learning how the underlying operating system behaves when we interact with it is certainly very helpful.

Resources:

PCM Terminology and Concepts - alsaaudio documentation 0.8.4 documentation

In order to use PCM devices it is useful to be familiar with some concepts and terminology.

larsimmisch.github.io

Android Audio Framework Architecture

This article highlights Android audio architecture, audio frameworks, role of audioflinger, audio HAL and its…

gopinaths.gitlab.io

Audio output latency

This is the official site for high-performance audio on Android.

googlesamples.github.io

Thanks to Carrie Hall, Alexander Lenhardt and Miloš Pešić.