Async Support for TensorFlow Backend in FFmpeg

Published in

CodeX

4 min readAug 19, 2021

This blog post summarises my Google Summer of Code 2021 project with Intel Video and Audio for Linux. This summer was filled with a lot of hands-on learning with something I couldn’t have imagined doing otherwise.

Google Summer of Code 2021 with Intel Video and Audio for Linux. Editing Credits: Kenny Patel — Google Summer of Code 2021 with Intel Video and Audio for Linux

The Project

The project mainly focuses on implementing an asynchronous inference mechanism in the backends of the FFmpeg Deep Neural Networks (DNN) module, though it also has other optional deliverables. You can view the original proposal here.

The DNN module has three primary filters:

vf_dnn_processing for applying filters using deep learning models
vf_dnn_detect for object detection
vf_dnn_classify for image classification

There are two other filters — vf_sr(for superresolution filter) and vf_derain (for de-rain filter), though for using the full functionality vf_dnn_processing should be preferred.

Technical Stack

The first thing anyone asks when they see a project is what kind of technical stuff it builds on. In my case, the project was written entirely in the C language and used the pthread library for multithreading.

Deliverables

Preparation for Async Support (Required) — We switched to a task-based mechanism where each task corresponds to an input frame to prepare for the async mode. This approach is now common across all three backends.
Async Support in TensorFlow backend (Required) — Initially, the TensorFlow backend supported only the synchronous mode of model inference.
Unification of Execution Modes from Filters’ perspective (Optional) — Currently, the backend provides different functions for async and sync modes for use in the filters. With this deliverable, the control for choosing execution mode comes in the hand of the backend. This deliverable will further help extend the batch execution mode in sync mode as well.
Async Support in the Native Backend (Optional) — The native backend is used for model inference when the target system does not support OpenVino or TensorFlow backend. This backend also supports only the synchronous model execution, but we can extend the async support to the native backend using the same mechanism.
Support for Batch Mode in TensorFlow backend (Optional) — Loading multiple image frames as a single batch and inferring them at once is less expensive on the system than processing all frames one by one. Enabling batch inference for model inference will significantly boost the TensorFlow backend’s performance if clubbed with the async mode.

Work Done on the Project

The following pull requests contain the work related to this project. Each pull request contains the list of commits relevant to the patchset in its description.

What’s complete? As of the submission, all required deliverables were completely merged, and the optional deliverables were ready for review.

Other than these major patchsets of the project, I also contributed some documentation to the Native Backend layers functions and fixed some minor memory leaks in the backends, which can be viewed here. To improve error handling, we return specific error codes from the DNN backends in this patchset.

My Merged Patches to FFmpeg DNN Module. Src: https://bit.ly/3mdeluB — My Merged Patches to FFmpeg DNN Module. https://bit.ly/3mdeluB

Why are the pull requests closed instead of being merged?
That’s because the mentors reviewed the PR on GitHub, where Intel CI tests were also run to check if the patches function correctly. The patches were then sent to the FFmpeg mailing list for final review and then the patches were merged.

The Idea Behind Asynchronous Inference

Let me give a bit of background here. The DNN backends use TensorFlow C API and OpenVINO Inference Engine to load and execute the Deep Learning models.

In the synchronous mode, they call the ff_dnn_execute_model function with the input frame expecting an output frame in return. In async mode, the filters send the input frame using ff_dnn_execute_model_async, and the backends start the inference returning success. The process repeats till all frames have been sent to the backend. At a particular time, a maximum of nireq asynchronous requests can execute simultaneously.

Now the filters start to call ff_dnn_get_async_result for the output frames. This function sequentially returns the frames as received by the backend regardless of when the inference finished.

Now, since TensorFlow C API does not provide any asynchronous functions as OpenVINO does, we had to implement a mechanism such that model inference occurs asynchronously to the main FFmpeg filter thread.

For this purpose, we have added DNNAsyncExecModule which executes the RequestItem from the backend on a different thread. This thread is joined before starting the subsequent inference of the same RequestItem. If the last inference failed, the exit status is caught, and we cancel all further execution for the current session.

Why join threads? Can’t we detach the threads?
Earlier the plan was to use detached threads, but to extend support on the Windows build and have better error handling, we shifted to use joinable threads.

Results

The TensorFlow backend showed a performance gain on the application of the async patchset. For a common GPU with 2GB memory, the improvement could is more on the CPU variant than on the GPU variant of the TensorFlow C API.

The performance gain due to only async mode on a 10-second video for the CPU variant on a quad-core CPU is documented in this patch.

Credits

I want to thank Google and the GSoC team for providing me this excellent opportunity. I would like to sincerely thank the mentor Yejun Guo, for guiding throughout the project and helping me clear my doubts. Special thanks to the mentor Ting Fu for testing the patchset and the Intel Media team for their support.