GStreamer Media Foundation Video Encoder Is Now Faster — Direct3D11 Awareness

Seungha Yang
3 min readJun 24, 2021

--

TL;DR

GStreamer MediaFoundation video encoders (H.264, HEVC, and VP9 if supported by GPU) gained the ability to accept Direct3D11 textures, which will bring noticeable performance improvements

As of the GStreamer 1.18 release, hardware accelerated Direct3D11/DXVA video decoding and MediaFoundation based video encoding features were landed.

Those native Windows video APIs can be very helpful for application development/deployment, since they are hardware platform-agnostic APIs for the Windows platform. The questions is if they are sufficiently competitive with hardware-specific APIs such as NVIDIA NVCODEC SDK or Intel Media SDK?

Probably the answer is … “NO”

How much faster than before are things?

One simple way to compare performance would be to measure the time spent for transcoding. Of course, encoded file size and visual quality are also very important factors. However, as per my experiments, resulting video file size and visual quality (in terms of PSNR) were very close to each other. Then our remaining interest is speed!

Let’s take a look at my measurement. I performed the measurement by using one 4K H.264 video content with an NVIDIA RTX 3060 GPU and an Intel Core i7–1065G7 integrated GPU. For reference, NVCODEC and Intel Media SDK plugins were tested by using GStreamer 1.19.1 as well. Each test used performance (speed) oriented encoding options to be a fair comparison.

- NVIDA RTX 3060

GStreamer 1.18 — 2 min 1 sec
GStreamer 1.19.1 1 min 9 sec
NVCODEC plugin (nvh264dec/nvh264enc pair) — 1 min 19 sec

- Intel Core i7–1065G7 integrated GPU

GStreamer 1.18 — 3 min 8 sec
GStreamer 1.19.1 — 2 min 45 sec
Intel Media SDK plugin (msdkh264dec/msdkh264enc pair) 3 min 10 sec

So, is it true that the Direct3D11/DXVA and MediaFoundation combination can be faster than hardware-specific APIs? Yes, as you can see

Note that such results would be very system environment and encoding option dependent, so, you’d likely see different numbers

Why MediaFoundation plugin got faster

GStreamer 1.18 — The story was, because of the lack of Direct3D11 integration at MediaFoundation plugin side, each decoded frame (Direct3D11 texture) must be downloaded into system memory first, which is usually very slow path. And then, the memory was copied to another system memory allocated by MediaFoundation. Moreover, likely GPU driver would upload to GPU memory again. Well, twice visible redundant copies and another potential copy per frame!?!? hrm…

In GStreamer 1.19.1, thanks to the Direct3D11 integration, MediaFoundation can accept Direct3D11 texture, which means we don’t need to download GPU texture and re-upload it any more.

More details

Since all Direct3D11/DXVA, MediaFoundation, NVCODEC, and Intel Media SDK APIs work with underlying GPU hardware, the performance should not be much different in theory, unless there are visible overhead around GPU vendor’s driver implementation.

Then, remaining factor would be API consumer-side optimization.
And yes, from GStreamer plugin implementation point of view, Direct3D11/DXVA and MediaFoundation plugins
are more optimized than NVCODEC and MSDK plugins in terms of GPU memory transfer on Windows.

It doesn’t mean Direct3D11/DXVA and MediaFoundation themselves are superior APIs than hardware-specific APIs at all. The difference is just result of more or less optimized plugin implementations

You can try this enhancement right now!

Install the official GStreamer 1.19.1 release, and just run this command.

gst-launch-1.0.exe filesrc location=where-your-h264-file-located ! parsebin ! d3d11h264dec ! queue ! mfh264enc ! h264parse ! mp4mux ! filesink location=my-output.mp4

You will be likely able to see the improvement by yourself :)

There are still a lot of interesting topics for better Windows support in GStreamer. Specifically, nicer text support via DirectWrite and fine tuned GPU scheduling via Direct3D12 are on our radar. Not only for video features, we will keep improving various Windows specific features, including audio capture/render device support.

If you’ve see any bugs, please contact me, and even better would be a bug report at GitLab. I’m watching it most of time 😊

--

--