Record, Replay and Visualize Raw Audio Data in Android

8 min readOct 12, 2017

Introduction

Today I want to show you how to record, play and visualize raw audio data in Android. Recording in raw audio format gives you full control and allows you to visualize the captured audio data. In this article I will show you how you can leverage the low level AudioRecord and AudioTrack APIs to record and play raw data by taking full control over the audio capture and playback hardware of your device. Finally, I will introduce a custom control that I have developed for visualizing this data.

Some theory

I have to touch on some of the basics in signal processing for you to get the whole picture. Let’s start with the obvious: in nature, the audio signal is analog. Turning analogue signal data to digital is done through a process called sampling. From Wikipedia:

In signal processing, sampling is the reduction of a continuous signal to a discrete signal. A common example is the conversion of a sound wave (a continuous signal) to a sequence of samples (a discrete-time signal). A sample is a value or set of values at a point in time and/or space.

The speed at which these samples are taken is called sampling rate. For audio recording its usually 44100 Hz, which means 44 100 samples taken from the analogue signal per second. The most common method to digitally represent analogue samples is called Pulse-code modulation or PCM. Lastly, I should mention that the most common PCM bit depth is a signed short or 16 bits.

Each android phone has different audio capture and playback hardware with some allowing advanced modes such as stereo audio capture. If you are interested in compatibility you will be pleased to know, that all android phones are guaranteed to support capturing a single channel of audio at 44100 Hz in 16-bit PCM encoding. You can find more information in the AudioFormat reference page.

Recording raw audio

Having covered the basics now its time to write some code. Recording raw audio data is done through using an AudioRecord object. Setting up one requires an audio source, channel configuration, encoding and buffer size. The buffer size is in bytes and represents the number of bytes that we can take at a time. There is a convenience method called getMinBufferSize(), that can calculate one for you depending on the provided configuration and your phone’s hardware.

int bufferSize = AudioRecord.getMinBufferSize(SAMPLE_RATE,
        AudioFormat.CHANNEL_IN_MONO,
        AudioFormat.ENCODING_PCM_16BIT);AudioRecord record = new AudioRecord(MediaRecorder.AudioSource.DEFAULT,
        44100,
        AudioFormat.CHANNEL_IN_MONO,
        AudioFormat.ENCODING_PCM_16BIT,
        bufferSize);

On my Nexus 5, the buffer size is 3584 bytes. This means that the AudioRecord will give me 1792 samples (or around 41ms of audio) at a time. I should point out that there is also an AudioRecord builder that you could use if the AudioRecord constructor doesn’t suit your needs. In fact, behind the scenes the constructor is using the builder. Here is how to setup the same instance but this time using the builder:

AudioRecord record = new AudioRecord.Builder(
        .setAudioSource(MediaRecorder.AudioSource.DEFAULT)
        .setAudioFormat(new AudioFormat.Builder()
                .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
                .setSampleRate(SAMPLE_RATE)
                .setChannelMask(AudioFormat.CHANNEL_IN_MONO)
                .build())
        .setBufferSizeInBytes(bufferSize)
        .build();

Now that we have created it we can start recording. Obtaining the raw audio samples is done through polling. Like any continuous IO, you should use a separate dedicated thread. Before the loop you have to signal the AudioRecord by calling startRecording(). The end of the recording process is signaled by a call to stop(). Keep in mind that AudioRecord has to release some native objects, so you have to release it at some point.

final int SAMPLE_RATE = 44100; // The sampling rate
boolean mShouldContinue; // Indicates if recording / playback should stopvoid recordAudio() {
    new Thread(new Runnable() {
        @Override
        public void run() {
            android.os.Process.setThreadPriority(android.os.Process.THREAD_PRIORITY_AUDIO);            // buffer size in bytes
            int bufferSize = AudioRecord.getMinBufferSize(SAMPLE_RATE,
                    AudioFormat.CHANNEL_IN_MONO,
                    AudioFormat.ENCODING_PCM_16BIT);            if (bufferSize == AudioRecord.ERROR || bufferSize == AudioRecord.ERROR_BAD_VALUE) {
                bufferSize = SAMPLE_RATE * 2;
            }            short[] audioBuffer = new short[bufferSize / 2];            AudioRecord record = new AudioRecord(MediaRecorder.AudioSource.DEFAULT,
                    SAMPLE_RATE,
                    AudioFormat.CHANNEL_IN_MONO,
                    AudioFormat.ENCODING_PCM_16BIT,
                    bufferSize);            if (record.getState() != AudioRecord.STATE_INITIALIZED) {
                Log.e(LOG_TAG, "Audio Record can't initialize!");
                return;
            }
            record.startRecording();            Log.v(LOG_TAG, "Start recording");            long shortsRead = 0;
            while (mShouldContinue) {
                int numberOfShort = record.read(audioBuffer, 0, audioBuffer.length);
                shortsRead += numberOfShort;                // Do something with the audioBuffer
            }            record.stop();
            record.release();            Log.v(LOG_TAG, String.format("Recording stopped. Samples read: %d", shortsRead));
        }
    }).start();
}

Each time we read audio data, the system will block us until it gets enough audio samples to fill the buffer. In my case it will block the thread for around 41ms. Whatever you do in the polling loop, you should do it in less than those 41 ms or you might miss some samples. Its pretty much like the onDraw(). Avoid doing IO on this audio recording thread. If you need to store the audio data, its best to use the classical Producer — Consumer pattern.

Playing raw audio

Playing raw audio is very similar. To do that we are going to need an AudioTrack object. Again we need buffer size and there is a similar getMinBufferSize() convenience method for that. The AudioTrack is setup very similarly like the AudioRecord, but this time we provide output stream type and channel configuration instead of input. Also we have to decide how we are going to feed the samples to the AudioTrack: Static or Stream. Static means that the entire audio file is loaded in memory and can be played multiple times without reloading. This can be useful for small audio files that are played frequently. For large files streaming is more efficient.

int mBufferSize = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO,
        AudioFormat.ENCODING_PCM_16BIT);
if (mBufferSize == AudioTrack.ERROR || mBufferSize == AudioTrack.ERROR_BAD_VALUE) {
	// For some readon we couldn't obtain a buffer size
    mBufferSize = SAMPLE_RATE * CHANNELS * 2;
}AudioTrack mAudioTrack = new AudioTrack(
        AudioManager.STREAM_MUSIC,
        SAMPLE_RATE,
        AudioFormat.CHANNEL_OUT_MONO,
        AudioFormat.ENCODING_PCM_16BIT,
        mBufferSize,
        AudioTrack.MODE_STREAM);

Again, just like with AudioRecord, there is a builder for that. This time it gives somewhat more control over the produced audio track.

AudioTrack audioTrack = new AudioTrack.Builder()
        .setAudioAttributes(new AudioAttributes.Builder()
                .setUsage(AudioAttributes.USAGE_MEDIA)
                .setContentType(AudioAttributes.CONTENT_TYPE_MUSIC)
                .build())
        .setAudioFormat(new AudioFormat.Builder()
                .setEncoding(AudioFormat.ENCODING_PCM_16BIT)
                .setSampleRate(SAMPLE_RATE)
                .setChannelMask(AudioFormat.CHANNEL_OUT_MONO).build())
        .setBufferSizeInBytes(bufferSize)
        .setTransferMode(AudioTrack.MODE_STREAM)
        .build();

Last step is provisioning the AudioTrack. I am going to show you how you can stream the audio samples in a loop. Using a ShortBuffer is handy, because it keeps track of the position and it has methods for obtaining a given number of items.

ShortBuffer mSamples; // the samples to play
int mNumSamples; // number of samples to playvoid playAudio() {
    new Thread(new Runnable() {
        @Override
        public void run() {
            int bufferSize = AudioTrack.getMinBufferSize(SAMPLE_RATE, AudioFormat.CHANNEL_OUT_MONO,
                    AudioFormat.ENCODING_PCM_16BIT);
            if (bufferSize == AudioTrack.ERROR || bufferSize == AudioTrack.ERROR_BAD_VALUE) {
                bufferSize = SAMPLE_RATE * 2;
            }            AudioTrack audioTrack = new AudioTrack(
                    AudioManager.STREAM_MUSIC,
                    SAMPLE_RATE,
                    AudioFormat.CHANNEL_OUT_MONO,
                    AudioFormat.ENCODING_PCM_16BIT,
                    bufferSize,
                    AudioTrack.MODE_STREAM);            audioTrack.play();            Log.v(LOG_TAG, "Audio streaming started");            short[] buffer = new short[bufferSize];
            mSamples.rewind();
            int limit = mNumSamples;
            int totalWritten = 0;
            while (mSamples.position() < limit && mShouldContinue) {
                int numSamplesLeft = limit - mSamples.position();
                int samplesToWrite;
                if (numSamplesLeft >= buffer.length) {
                    mSamples.get(buffer);
                    samplesToWrite = buffer.length;
                } else {
                    for (int i = numSamplesLeft; i < buffer.length; i++) {
                        buffer[i] = 0;
                    }
                    mSamples.get(buffer, 0, numSamplesLeft);
                    samplesToWrite = numSamplesLeft;
                }
                totalWritten += samplesToWrite;
                audioTrack.write(buffer, 0, samplesToWrite);
            }            if (!mShouldContinue) {
                audioTrack.release();
            }            Log.v(LOG_TAG, "Audio streaming finished. Samples written: " + totalWritten);
        }
    }).start();
}

As you see here, the process is similar to what we did when we recorded audio. Again you have to call play() to signal the AudioTrack to begin playing. One difference though: the method for writing audio samples has only a non-blocking version until API level 23. This means, that when the call to write() is done, the actual samples are queued for a later playback. This gives one complication: when should I release the audio track? The AudioTrack gives you the option to place a marker at a given time and notify you when it has been reached. Also it can give periodic notifications at regular intervals. You can place a marker at the last sample and the AudioTrack will notify you when it has finished playing so that you can release it.

audioTrack.setPlaybackPositionUpdateListener(new AudioTrack.OnPlaybackPositionUpdateListener() {
    @Override
    public void onPeriodicNotification(AudioTrack track) {
        if (track.getPlayState() == AudioTrack.PLAYSTATE_PLAYING) {
            int currentFrame = track.getPlaybackHeadPosition();
            int elapsedSeconds = (currentFrame * 1000) / SAMPLE_RATE;
        }
    }
    @Override
    public void onMarkerReached(AudioTrack track) {
        Log.v(LOG_TAG, "Audio file end reached");
        track.release();
    }
});
audioTrack.setPositionNotificationPeriod(SAMPLE_RATE / 30); // 30 times per second
audioTrack.setNotificationMarkerPosition(mNumSamples);

Note that calling release() implicitly stops the playback, so if you do it too early the playback will suddenly stop.

There are some interesting methods that I want to point out here.

pause() will pause playback for a bit. The audio data that will not be discarded;
getPlayState() will give you the playback state (Playing / Paused / Stopped);
getPlaybackHeadPosition() will give you the number of frames that have been played;
flush() will discard all audio data that is queued for playback but not played yet;

Visualizing raw data

Audio data is typically displayed as a waveform due to its nature. I have developed a small control that can display audio data when you are recording it or when you want to play it.

I decided to split it in two modes due to the contextual difference. When recording I am drawing the captured samples as a continuous line. Because audio data comes at short burst, I am keeping 6 bursts of audio data (around 240ms of audio data in total on my phone). This allows me to visualize more data and it also brings a nice fading effect of the older data.

When playing audio I am drawing all the data as a nice classical waveform. I have included a playback position indicator that shows the playback progress. As you can see in the image above, the waveform supports separate stroke and fill brushes which is handy.

You can find the code on GitHub together with example usage. I have separated the control into a library project so that you can easily put it in your projects. Also there is a demo app that leverages the code shown above for recording and playing audio file together with the waveform control that I have developed. It’s published under the MIT license, so anyone can use it and modify it to his liking.

Epilogue

Android has great tools for playing and recording audio. I have just scratched the surface here, so check out the links that I have provided for more information. Also I suggest checking out the actual code behind the AudioRecord and AudioTrack as it helps a lot. Finally I introduced a custom control that you can use to visualize the raw audio data in your projects.

This post was originally shared on the New Venture Software blog.