Accelerating H264 decoding on iOS with FFMPEG and VideoToolbox
At LIVEOP, we focus on providing first responders with the most relevant information in a concise manner, while at the same time not compromising on our seamless user experience. When we partnered up with Zepcam, a leading provider of wireless (body-worn) camera systems around the world, we wanted to make sure that we delivered an experience that conforms to our high standards, not compromising on performance or efficiency.
Camera streams hosted by Zepcam come in several different formats, most importantly, HTTP Live Streaming (HLS), a first class citizen in the iOS ecosystem with built-in support in AVFoundation
, and RTSP, the Real Time Streaming Protocol. HLS streams are commonly used for live television and news broadcasts. It focuses on a seamless experience for the viewer: frame drops are disallowed, out-of-order playback of frames is not allowed, and a small buffer of upcoming frames is maintained to ensure a smooth playback experience. The situations during which Zepcam streams are activated are often life-threatening. Officers could be live-streaming from their body-worn camera’s while attempting to contain a riot, or a ladder engine with a camera mounted on top could be providing a birds eye view of a large building fire including the position of firefighters on the ground. Our definition of a seamless user experience is different from the one prescribed by HTTP Live Streaming: in our case, it is important that the frames displayed to the user are as realtime as possible. They could arrive out-of-order, and a couple of frames could be dropped, as long as this benefits the realtime-ness of the stream. Adding our requirements up, we arrived at using RTSP over UDP.
Apple does not provide support in any of the high-level frameworks for playback of RTSP streams. MPMoviePlayerController
, AVPlayerItem
and AVPlayer
, all high-level system classes for playback of video streams, do not support RTSP streams. Fortunately, FFMPEG, the swiss army knife of audio/video processing, is equipped with the right tools to process and decode RTSP streams. FFMPEG has been around for 17 years now in the open source community, and has since established itself as a reliable force behind a variety of end-user applications, such as VLC, Google Chrome, and Chromium¹.
Setting up FFMPEG
The RTSP streams served by Zepcam are encoded with the H264 codec. In order to prevent a massive increase in binary size of our final iOS application file (.ipa), we chose to compile the latest release of FFMPEG from scratch (v4.0.1), enabling only those features that we expect to use. We use the excellent build script found here, with a couple adjustments:
- Change the
FF_VERSION
variable to 4.0.1 - Change the
DEPLOYMENT_TARGET
to the deployment target of your iOS application - Change the
CONFIGURE_FLAGS
to enable bitcode, and disable all features except those required for our stream:
CONFIGURE_FLAGS="--enable-cross-compile --disable-debug --disable-programs --disable-doc --extra-cflags=-fembed-bitcode --extra-cxxflags=-fembed-bitcode --disable-ffmpeg --disable-ffprobe --disable-avdevice --disable-avfilter --disable-encoders --disable-parsers --disable-decoders --disable-protocols --disable-filters --disable-muxers --disable-bsfs --disable-indevs --disable-outdevs --disable-demuxers --enable-protocol=file --enable-protocol=tcp --enable-protocol=udp --enable-decoder=mjpeg --enable-decoder=h264 --enable-parser=mjpeg --enable-parser=h264 --enable-parser=aac --enable-demuxer=rtsp --enable-videotoolbox"
In addition, a small change in the FFMPEG sourcefile libswresample/arm/audio_convert_neon.S
is required, as described here. Compilation should now succeed, yielding several different libraries. Drag the libraries into your Xcode project and make sure to link them with your application target (Build Phases > Link Binary With Libraries).
The global setup required to achieve video playback through FFMPEG is quite straightforward. Open the input URL pointing to the RTSP stream with avformat_open_input
, find the streams from the input with avformat_find_stream_info
, allocate a codec context with avcodec_alloc_context3
and avcodec_parameters_to_context
, and finally open the codec with avcodec_open2
. It is important to implement proper error handling and memory cleanup for all these methods, as they could all fail depending on the circumstances. In our application we also chose to implement an interrupt callback in order to exit the blocking methods early in certain situations, such as a lack of internet connection as signalled by SCNetworkReachability
APIs or after a custom timeout timer has expired. Especially coupling with the reachability APIs allows us to circumvent built-in FFMPEG timeouts and fail early when no internet connection is detected.
Decoding Frames
The AVCodecContext
struct exposes a get_format
field which allows us to pick an outputAVPixelFormat
for the video frames delivered by the decoder, out of a list of available formats. If we leave this field empty, the video frames will be formatted as AV_PIX_FMT_YUV420P
, a format automatically detected by the decoder based on the underlying stream. Images on iOS are formatted in RGB(A) (AV_PIX_FMT_RGB24
), hence an extra step would be required to convert the frames from AV_PIX_FMT_YUV420P
to AV_PIX_FMT_RGB24
before display. libswscale
provides a function sws_scale
that does exactly this, but unfortunately it is not implemented on the GPU, meaning we incur a performance hit while performing the extra conversion step from YUV420P to RGB24.
Out of the list of available pixel formats we receive through the get_format
function, there is one that requires our special attention: AV_PIX_FMT_VIDEOTOOLBOX
. Although poorly documented, this format tells the decoder to pass the incoming frames to Apple’s VideoToolbox.framework
. It will decode each incoming frame on the GPU, returning a CVPixelBufferRef
holding the decoded data. This is much preferred over the default implementation, which requires an extra conversion from YUV420 to RGB24 on the CPU. Our function handle passed to the get_format
field of the AVCodecContext
now looks like this:
static enum AVPixelFormat negotiate_pixel_format(struct AVCodecContext *s, const enum AVPixelFormat *fmt) { while (*fmt != AV_PIX_FMT_NONE) {
if (*fmt == AV_PIX_FMT_VIDEOTOOLBOX) {
if (s->hwaccel_context == NULL) {
int result = av_videotoolbox_default_init(s);
if (result < 0) {
return s->pix_fmt;
}
}
return *fmt;
}
++fmt;
}
return s->pix_fmt;
}
We make sure the VideoToolbox format is available before attempting to use it. If it is not available, or if initializing the videotoolbox integration fails, we fall back to the format originally found by the decoder. Note that the AV_PIX_FMT_VIDEOTOOLBOX
is unavailable on the iOS simulator. In the teardown method of our videoplayer class, we check if the codec context’s hwaccell_context
is not NULL and call av_videotoolbox_default_free
if this is the case.
Individual video frames are received with avcodec_receive_frame
. On return of this function, the AVFrame
output parameter will be filled with information encoding a single video frame, depending on the pixel format used. If the VideoToolbox format was successfully used, then a CVPixelBufferRef
holding the frame data can be found at AVFrame.data[3]
. Although this is not explicitly documented, it is obvious from the comments placed behind the definitions of other hardware-accelerated formats in pixfmt.h
. The CVPixelBufferRef
cannot immediately be displayed. We first convert it to a CIImage
with +[CIImage imageWithCVPixelBuffer:]
, and then turn the CIImage
into a UIImage
with +[UIImage imageWithCIImage:]
. The finalUIImage
describes one video frame and can now be displayed within your video player, e.g. implemented as a plain UIImageView
.
In our tests between using the default pixel format, including the extra conversion step to RGB24, and the VideoToolbox format, we noticed a significant performance difference. With CPU decoding, we experienced an overall sluggish playback and a relatively large number of frame drops. With GPU decoding, close to zero frames were dropped. Despite not being documented too well, digging deeper into the internals of FFMPEG to achieve GPU accelerated decoding of video streams on iOS is definitely worth it.
We cannot wait to see how our Zepcam integration will improve the workflow of the men and women working everyday to keep our society safe.