Mixed Reality Studios — A primer for producers: Part 2

Nils Porrmann
frame:work
Published in
7 min readOct 26, 2020

In 2020 Mixed Reality has become a much discussed topic in media production. This article is part of a series and taken from a bigger release at dandelion + burdock and inspired by everybody’s efforts at frame:work

One View, Many Perspectives

Up to this point we have looked at the combination of real + virtual space and the composition stack for a single point of view. Now we need to explore key components and technical integration.

Most XR stages have multiple cameras. These are necessary in live events, as they rely on calling cameras to direct attention and narrative in real time, while XR for film and TV can be edited in post, and therefore put more focus on fewer cameras in benefit of higher quality, resolution and colour reproduction (see also more lighting integration)

The first essential technical element for XR is tracking. Camera tracking systems can work in different ways. To point out a few: encoder-based (Mo-Sys), wifi with infrared active beacons (Vive), markerless(N-Cam), IR-camera sensors of passive tracking points (Stype, N-Cam).

Stype tracking points (floor), tracking monitor and sensor attached to the camera.

In XR the latter are most commonly used, although bigger productions will inevitably combine systems. The individual benefits lie in the application. In camera considerations, lighting conditions as well as set design determine the best match.

The tracking system provides hard and software to calibrate the stage. The scanned real space provides the transformations to subsequently align and scale the virtual space, for both to form a unit. The camera information is pooled and transmitted to the media server and render engines. The engine’s virtual camera receives the tracking information, and because of the alignment, the perspective within the render engine matches the shot defined by the real world camera.

Real time rendering is the second prerequisite for XR. Rendering engines host the virtual scenery and are designed to provide real time interactivity, lighting and dynamics. While Unity and Unreal are coming from a games background, Notch have been in this field for art and events.

The design process is similar in all of the engines, and it requires creative teams to consider computer performance, event-logic and hardware integration. The essential interactivity is the freedom of the viewport, aka the camera. Real time lighting is a second (see below).

Games and effect engines can integrate tracking data directly and therefore provide standalone solutions. This is beneficial to minimise latency whilst employing the very latest hardware. A downside to this approach is sequencing media. Real time rendering is at an advantage when it does not need to load and buffer video.

Modern media servers are comprehensive 3d tools. They typically manage video, images, audio etc. and send it to display surfaces via processes of chunking, transformation and pixel mapping. The distribution of the mapped results is the major argument to manage real time engines through media servers. They furthermore handle control protocols and connect to show related periphery. As a consequence solid integration between media server and render engine, open up the production toolset and allow to mix and match the respective strengths.

With the above information we can consider two types of setup:

  1. In a parallel setup each tracked camera has its own dedicated render computer and transmission hardware. While all renderers hold the same virtual scene, each computer provides one of the multiple views. These are switched to downstream. Media servers provide data upstream of the renderers.
  2. In a switched layout an XR media server functions as a host. It pools the tracking data and either detects or determines the single active camera, which is rendered exclusively. The active view switch is done in the XR server. To provide more than one composited view the host can manage multiple render nodes. As a benefit the XR server can provide typical media sequencing and further control integration as well as bringing in signals from further sources.

In either situation the LED wall can only display one camera backplate at a time.

A parallel setup has one or more tracked cameras, each forwarded to a dedicated rendering server. Each server receives its respective tracking information to provide the 3d scene from the corresponding perspective. While each composite and camera bundle can be forwarded to the downstream AV system, the one active XR scene view for the LED screen is switched to downstream of the renderers.

Upstream of the rendering servers a media server can provide traditional video, graphics streams and presentations. These display as textures inside the virtual scene.

The switched setup has one or more tracked cameras. Video signals are sent to the media server and switched to an active view. The active view is the camera currently used. The media server manages the virtual scene renderer and passes the active camera information to it. The returned render is mapped into a representation of the LED screens and output accordingly. Separately the media server can composite all layers or forward them discreetly into other AV infrastructure. This kind of setup can utilise Unreal’s nDisplay feature for example

Both of the above examples are schematics to illustrate the signal and control flow. They are not comprehensive wiring diagrams. We have omitted synchronisation and networking architecture , which are further critical constituents.

While a single active view XR may benefit from an engine-only setup, the benefit of leveraging higher level integrations and mapping tools indicate that in XR for live streaming and broadcast, media server platforms are advised. This suggestion gains weight as we consider the particular quality challenges for integrating cameras, renderers, lighting and recording.

New Sophistication, New Minutiae

Media servers such as disguise, VYV-Photon, Pixera, ioversal-Vertex, LightAct, Green-Hippo, Smode, and Touchdesigner implemented Notch for a number of years while support for Unreal and Unity has been and is product dependant, though generally on the rise.

With a focus on XR setup and operation the forerunners appear to be vyv and disguise, since they are providing a platform to resolve particular challenges above and beyond mapping, switching, and hosting of render engines. Solutions like Pixotope, Ventuz, ZeroDensity and VizRT provide similar solutions for green screen environments. In our discussion we focus on events, where we feel that the performers and scenery need to relate, act and present in a unified space. Therefore avoiding additional keying steps and further colour post processing is important. The XR formats further merit is an inclusion of the production team and live audiences to a common visual framework.

To achieve a high quality inclusion, two features are of particular interest lifting the media server to a platform:

Firstly; addressing the data roundtrip cost. Since the active camera is moving, being tracked, passed to the renderer, and then displayed on the LED, any XR rendered frame on the stage is late. This delay needs to be managed, for instance by buffer and passing results to the compositing stack we discussed earlier. As a result the media server also becomes a timing master.

Secondly; there is a multifactor problem about colour. On one hand LED tiles are tough to get right. They only truly have uniform colour when they derive from the same production batch, and even then need recalibration to a common level after some time. What is more, in the physical setup the XR stages and volumes have to resort to different tile products to handle props and performers on the walk on floor. This leads to a more robust pick, that commonly has lower resolution and a different colour output profile, perhaps even brightness. A final complication with LED tiles is the arrangement of the red, green and blue diodes that combine into a single pixel. Without going into too much detail, this arrangement changes the colour perception of the LED screen based on the viewing angle of the camera or observer.

On the other hand we find that the image of the virtual scene, received by the camera has been nuanced by the above physical limits of reproduction in the LED, and has furthermore been superimposed with lighting around the XR stage to make the scenery and performers fit the virtual environment. This colour roundtrip can be observed when comparing the camera picture and set extension.

colour calibrated set. lit scene backplate
transmission with added set extension and colouring difference (purposefully unedited set extension from disguise, no post to either image)

To manage the colour reproduction, the media server platform needs to calibrate cameras, tiles and residual ambient lighting. This is an extensive process, but the time invested is critical to the overall quality. As this calibration time does not previously exist in most production schedules, please review the requirements of the XR team for camera calibration.

So far our team tested disguise’s delay and colour calibration process. It allows different quality levels that come at different speeds. The more colour accuracy, the longer the lookup.

The visual results are worth the waiting. With regard to the inter-tile calibration and the camera viewing angle of the set, the process achieves a very homogeneous picture. Matching the set extension is reasonable. Since the virtual scene in the extension is not exposed to all the real world factors, matching it remains complex for the above mentioned roundtrip, although it can be edited separately in the server

Matching the colour balance of the set extension to the backplate is a sophisticated task. The set lights for props and performers need to be fed back into the virtual space to match shadows, specularity, colour and brightness. The media server platform can assist this process, so let;s look at lighting in more detail.

--

--