Bringing Microsoft Media Foundation to GStreamer

Seungha Yang
4 min readJul 14, 2020

--

The Microsoft Media Foundation plugin has finally landed as part of GStreamer 1.17!

Currently it supports the following features:

  • Video capture from webcam (and UWP support)
  • H.264/HEVC/VP9 video encoding
  • AAC/MP3 audio encoding

NOTE : Strictly speaking, the UWP video capture implementation is not part of the Media Foundation API. The internal implementation is based on the Windows.Media.Capture API.
Due to the structural similarity between Media Foundation and WinRT Media API however, it makes sense to include the UWP video capture implementation in this plugin.

Media Foundation is known as the successor of DirectShow.

As DirectShow does, Media Foundation provides various media-related functionality, but most of the features (muxing, demuxing, capturing, rendering, decoding/encoding and pipelining of relevant processing functionality) of Media Foundation can be replaced with GStreamer.

Then why do we need Media Foundation on Windows? Isn’t GStreamer enough?

Why do we need Media Foundation then?

When it comes to software implementation, there might be several alternatives such as the well-known x264 software encoder, but what’s the situation with hardware implementations?

A very important point here is that hardware vendors such as Intel, Nvidia, AMD and Qualcomm are abstracting their hardware video encoding API via Media Foundation. Therefore, device-agnostic, hardware-accelerated media processing can be achieved using Media Foundation (more specifically, as a Media Foundation Transform API) without any external library dependencies.

Moreover, MFT (Media Foundation Transform) encoders can be used in UWP applications (but some codecs might be blacklisted by OS in this case).

Well, then what would be difference between the well-known MSDK (Intel Media SDK), NVCODEC (Nvidia Codec SDK) and Media Foundation implementations?

From my perspective, a Media Foundation implementation could be as powerful as the vendor specific APIs, because Media Foundation uses vendor implementations underneath (e.g., libmfxhw64.dll for Intel and nvEncodeAPI64.dll for NVidia). That’s not the case for the moment though — there is some overhead/limitations from GStreamer’s Media Foundation API integration.

To make Media Foundation plugin as performant as the vendor specific APIs, there is some remaining work to be done. For example Direct3D support in the Media Foundation plugin is one such potential improvement.

Media Foundation plugin details

The gst-inspect-1.0 example below summarizes the list of elements belonging to the Media Foundation plugin. Similar to the GStreamer D3D11 plugin, the Media Foundation plugin will enumerate available encoder MFT first, and then will register each MFT separately. You might therefore see a different list of elements on your system (or their description might be different).

[gst-master] PS C:\Work\gst-build> gst-inspect-1.0.exe mediafoundation
Plugin Details:
Name mediafoundation
Description Microsoft Media Foundation plugin
Filename C:\Work\GST-BU~1\build\SUBPRO~1\GST-PL~3\sys\MEDIAF~1\gstmediafoundation.dll
Version 1.17.1.1
License LGPL
Source module gst-plugins-bad
Binary package GStreamer Bad Plug-ins git
Origin URL Unknown package origin

mfmp3enc: Media Foundation MP3 Encoder ACM Wrapper MFT
mfaacenc: Media Foundation Microsoft AAC Audio Encoder MFT
mfvp9device1enc: Media Foundation VP9VideoExtensionEncoder
mfvp9enc: Media Foundation Intel® Hardware VP9 Encoder MFT
mfh265device1enc: Media Foundation HEVCVideoExtensionEncoder
mfh265enc: Media Foundation Intel® Hardware H265 Encoder MFT
mfh264device1enc: Media Foundation H264 Encoder MFT
mfh264enc: Media Foundation Intel® Quick Sync Video H.264 Encoder MFT
mfdeviceprovider: Media Foundation Device Provider
mfvideosrc: Media Foundation Video Source

10 features:
+-- 9 elements
+-- 1 device providers
  • mfvideosrc: This element is a source element which will capture video from your webcam. Note that you can use this element in your UWP application.
  • mfdeviceprovider: Available video capture devices can be enumerated by this device provider implementation, and it can provide corresponding mfvideosrc elements.
  • mf{h264,h265,vp9,aac,mp3}enc: Each element is responsible for encoding raw video/audio data into compressed data. In the above example, you can see two h264 encoders mfh264enc and mfh264device1enc. That’s the case when (Microsoft) has approved hardware MFT on your system, therefore hardware MFT will be registered first (with mfh264enc) and then a lower rank will be assigned to software MFT.

NOTE : To build the Media Foundation GStreamer plugin, you should use the MSVC compiler as there might be some missing symbols in MinGW toolchain.

Wait, where are audio sources?

Audio capture sources are not implemented in this plugin. Use the wasapi or wasapi2 plugin in this case. In general, audio processing requires more complicated timing information and control. Unfortunately, Media Foundation doesn’t provide such low-level control for users, but the wasapi API does.

A short comment about wasapi2 plugin is that it was introduced as part of GStreamer 1.17 for the purpose of UWP support. (It should work on Win32 application as well). As a result of UWP support, however, the wasapi2 plugin requires Windows 10 as it uses very new Windows APIs (probably it might work on Windows 8, but I’ve tested the wasapi2 plugin only on Windows 10).

Some codecs and software decoders are not implemented in this plugin yet, but I expect they should be added soon!
And regarding hardware video decoder implementations, please refer to my previous DXVA2 blog post

--

--