Priming CMSampleBuffer containing AAC-encoded data using Apple’s Core Media API

Published in

Fandom Engineering

3 min readFeb 28, 2020

Lately, I’ve been spending a lot of my free time on a side project that focuses on converting real-time data obtained from AVCaptureSession into H.264 and AAC streams and saving them to an .mp4 file. While AVFoundation offers developers a pretty straightforward way of converting CMSampleBuffer into the desired format by means of AVAssetWriterInput and relevant output settings, I was unable to use that API because my use case required me to have direct access to converted raw bytes before they got appended to the target file.

In my case, I needed to utilize VTCompressionSession and AudioConverter in order to encode the data and then pass the resulting CMSampleBuffers to AVAssetWriter. I’m not going to get into the details of compression here; if you’re interested in learning more about it, you can have a look at the awesome open-source library, HaishinKit, and check out the source code there:

shogo4405/HaishinKit.swift

Camera and Microphone streaming library via RTMP, HLS for iOS, macOS, tvOS. Issuesの言語は、英語か、日本語でお願いします！ Please contains…

github.com

What I’d like to focus on is encoder delay and priming, a mechanism responsible for the correct decoding of audio samples. Here’s a link to Apple documentation describing the mechanism in detail:

Audio Priming - Handling Encoder Delay in AAC

This appendix describes temporal positioning of a source audio signal after AAC encoding into a sound track for…

developer.apple.com

Long story short, in order to begin writing CMSampleBuffers containing data encoded as AAC using AVAssetWriter, you need to attach priming information to the first sample buffer(s). This can be done by setting the kCMSampleBufferAttackmentKEy_TrimDurationAtStart attachment with the help of the following Core Media function:

func CMSetAttachment(_ target: CMAttachmentBearer, 
                 key: CFString, 
               value: CFTypeRef?, 
      attachmentMode: CMAttachmentMode)

The conventional priming duration for ACC is 2112/44.100 sec, while the duration of one sample I was getting from the AVConverter was 1024/44.100 sec. Based on the tribal knowledge available on StackOverflow and the likes, I was led to conclude that the priming info should be spread across the initial samples so that it did not span longer than the duration information in the PTS (presentation time stamp) of each CMSampleBuffer. So, the first sample buffer was supposed to get 1024/44.100 trim duration, followed by the second buffer with 1024/44.100 and third with 64/44.100 trim or just the second with 1088/44.100 (depending on sources).

It all seemed great until I’ve tried reading the resulting .mp4 file back using AVAssetReader. What turned out was that the first packet of the first sample buffer that had been successfully appended using AVWriterInput, or so it seemed, was missing! At first I thought, it simply did not get appended but then I managed to read it back from the file using ffmpeg on my Mac. I can’t really say how much time I’ve spent trying to debug this issue and what weird ideas and workarounds I’ve come up with when trying to fix it, but I was losing my mind and hope at the same time…

Me, having tried everything imaginable

Eventually, I did find the culprit and it’s kind of dumb in hindsight.

The reason why the first packet of the first buffer was missing when reading the data using AVAssetReader was that the trim duration attached to the first audio buffer was the same length or longer than the first sample and the reader just skipped it completely because it thought it was not supposed to be played back (which makes total sense at the time of writing this post). So instead of spreading the priming duration across multiple buffers, what I had to do was to lengthen the duration of the first buffer by the duration of the priming. I ended up writing the following function:

Once I’ve saved the first buffer primed using this function, everything worked like a charm: I was reading back data as it was written and there was no desynchronization between video and audio playback. Great success!

Borat approves!

My purpose in writing this post is that it’ll save the time and sanity of at least one poor soul that has a similar use case and problem to mine. Hi pal! Hope I helped you, please leave a like.

DISCLAIMER: I’m far from being a certified audio conversion wizard, so please forgive me if I mixed anything up due to my ignorance. Cheers!

Originally published at https://dev.fandom.com.

Priming CMSampleBuffer containing AAC-encoded data using Apple’s Core Media API

shogo4405/HaishinKit.swift

Camera and Microphone streaming library via RTMP, HLS for iOS, macOS, tvOS. Issuesの言語は、英語か、日本語でお願いします！ Please contains…

Audio Priming - Handling Encoder Delay in AAC

This appendix describes temporal positioning of a source audio signal after AAC encoding into a sound track for…

Written by Grzegorz Aperliński