How does OBS virtual camera plugin work on Windows?

Dmitry Kiselev
Deelvin Machine Learning
6 min readMar 22, 2022

The purpose of this article is to break down the work of OBS virtual camera for Windows operating system.

Windows media back-ends

In development of any standardized media application for Windows one faces the choice of media back-end. Most often, there are three options available: Video for Windows, DirectShow, and Media Foundation, but DirectShow normally wins over the other options. Why?

Why DirectShow?

Video for Windows is the oldest one. It is limited and not extensible. The real choice has always been between the old DirectShow and the new Media Foundation. Usually, the most recent technology is a go-to option, but not in the current circumstances. Media Foundation lacks a method to add virtual camera that could be discovered naturally, in the same way as other (non-virtual) devices. Windows 11 API allows this, but it is available only for the aforementioned operating system and is not as widespread as the alternative ones.

Thus, DirectShow is the only remaining option. See discussion details here.

OBS API Overview

OBS logo

OBS has extensive documentation including module API reference.

Plugins could be made for four types of modules: sources, outputs, encoders, and services. Sources are used to render video and audio. Encoders are obviously encoders of video and audio data, implemented specifically for OBS. Outputs allow to output rendered video and audio in the forms of raw or encoded data. Services are just custom implementations for streaming services which are used in conjunction with outputs for that particular service.

For any plugin, to implement a corresponding object, you need to define a particular structure and fill it out with information and callbacks related to your plugin. You can find more information here.

DirectShow Overview

Brief summary of component object model

COM is a binary-interface standard for software components developed by Microsoft. This means that COM could potentially be implemented on any platform.

It has a very extensive, almost bloated documentation.

What matters here is that COM objects are using interfaces to connect between each other and binary standardized interfaces only. This means that many components can interact, while being located in different processes or even on different machines.

The fact that COM has servers and clients is another notable point. Servers contain COM objects, which will be loaded and given to clients on-demand. On Windows, servers are usually executables (EXE) or dynamic-link libraries (DLL). These files can be placed anywhere on the host system, that’s why COM has concepts of a registry and a Service Control Manager (SCM). SCM locates components on local and remote hosts and connects servers to clients. And the registry exists to track where components are deployed, both on local and remote systems.

On Windows OS, the role of COM registry is usually fulfilled by the Windows registry. And before any freshly acquired component could be used, its server must be registered. On Windows, registration involves invocation of regsvr32 utility with server name, e.g.

Brief summary of DirectShow

DirectShow is a multimedia framework and API. It is based on COM and if one wants to use DS, he must know at least the basics of working with Component Object Model. There are several important things one needs to know about it.

DirectShow architecture diagram

DirectShow was designed for C++ only. It is based upon the concept of data processing graph creation, “filters” are its basic building blocks, implemented as COM objects. Each filter performs some kind of operation: generates, transforms or consumes video, audio and other things. Filters are chained into pipelines to perform complex data processing.

Each filter has one or more “pins” — objects that are responsible for connection between filters. When an output pin is connected to an input pin, they negotiate media types, connection parameters and if they could be connected at all.

Filters are always free-threaded and loaded from in-process COM servers only.

Baseclasses

Base class library is a sample provided with Windows SDK (or at least it was provided not long ago). It contains implementation of common DirectShow filter functionality. Baseclasses could be ignored entirely, but it’s not recommended. DirectShow exposes interfaces that could be implemented “from scratch”, with enough work, but not every project needs this fine control over filter behavior. Not to mention the astounding difficulty of such a task.

Step-by-step description of how data is handled in virtual camera

Virtual camera plugin in question is this one. To be more specific, this particular commit.

Preparations

The first thing to note is that the virtual camera project has several artifacts instead of one. OBS plugin must be copied in the OBS plugin folder. DirectShow server must be copied into the appropriate location (readme recommends obs-studio install folder) and registered. Otherwise, the virtual camera plugin will not work.

OBS Side

The first thing that happens to every OBS plugin is initialization via obs_module_load function. It registers two OBS plugin objects: one output here and one source (which is named “filter”) here.

The main interest is the output (because virtual camera deals with the sequence “obs output → queue → directshow source”). When output starts (with virtual_output_start) a queue is created. Then, during virtual camera work, frames are pushed to the queue.

Once started, the plugin will set a callback, which in turn will be pushing videodata down the queue indefinitely, until stopped.

Queue

Since the OBS plugin and the DirectShow filter reside in different processes, they need some form of interprocess communication to transfer frame data from former to latter. This is where the queue comes into play.

Queue manages a file mapping, accessing it for writing on the OBS side and accessing it for reading on the DirectShow side. Moreover, it creates mapping with INVALID_HANDLE_VALUE, effectively mapping a file that does not exist, i.e. creating a shared memory region for two processes. That memory is still backed by the system paging file though.

More information about file mappings can be found here.

DirectShow Side

The DirectShow side is represented by the DirectShow filter CVCam and its pin CVCamStream. The latter is indeed a pin, because it inherits CSourceStream from baseclasses. And CSourceStream (as shown here) inherits CBasePin, which in turn implements IPin interface along with many other things.

CSourceStream class inheritance diagram

This kind of complexity is precisely the reason why coding a new DirectShow filter without baseclasses is an awful pain. Without them, one would need to implement all those things by himself, relying solely on DirectShow interfaces.

The main point of interest in the DirectShow side is CVCamStream::FillBuffer. Because when graph setup is complete and all media types are set upon, this function does actual work pushing virtual camera frames down the graph.

FillBuffer gets any COM object implementing IMediaSample interface and fills it with frame data. First, it opens the queue, if it was not opened already. Before doing anything with the data, the function synchronizes the queue by waiting (if it’s too far ahead on schedule), or skipping frames (if it’s too far behind). Then, if it is time to get a frame, FillBuffer pulls it from the queue. Alternatively, if there are no frames available, the function will wait for 5 milliseconds then check again. As a minor note, FillBuffer also takes care of OBS to DirectShow timestamp conversion.

Summary

Although the inner workings of virtual camera plugin are relatively simple, they may be difficult to grasp at first glance. Many things here are done behind the scenes, just because base class library is thought out this way. We hope that this short description will be of help to anyone who wants to understand the inner workings of OBS virtual camera plugin on Windows or to implement an analogue.

Read more articles on Machine Learning from our Deelvin Machine Learning Blog.

--

--