Keeping Track: A Look at Cameras and their Influence on Matchmoving


The stress levels are rising, the deadline is looming and the shot I’m working on is taking far longer to matchmove than I first thought… We’ve all been there.

In this article I will be talking about a couple of camera acquisition types commonly used for film, television and VR, as well as discussing factors and limitations which make a seemingly easy matchmove take a whole lot longer.

Matchmoving is a technique used to track how the camera moves through the shot so that an identical virtual camera can be reproduced inside a software package. A process crucial in visual effects for integrating and exactly matching the perspective of CGI elements with live action plates.


Super 35mm Digital Cinema Camera with S4/i Cooke lens

Camera Acquisition

Cinema cameras or cine style cameras usually have a high resolution, high dynamic range, large format sensor with RAW data recording and the ability to capture high frame rates for slow motion. Commonly used for feature films, television dramas and commercials this type of camera offers the very peak in acquisition technology. The use of an industry standard positive lock (PL) lens mount enables the use of the same cinema primes and zooms on different manufacturers cameras. Nearly all cine style cameras record to common UHD broadcast and DCI spec film standards along with non standard raw frame sizes beyond 4K.

The Super 35 sized sensor inside the Canon C300

In the past 5 years “Super 35” sensors have become the standard for high end acquisition and will be the format you’re most likely to come across when matchmoving. One observation I do have is the loose definition manufactures have for describing the size of the sensor. Commonly you will see “super 35” on a lot of their advertising referring to the motion picture film format size of 24.89mm x 18.66mm. However if we delve deeper into the actual specifications we will see that this description is only an approximation of actual physical size of the sensor plane. Whilst small differences in field of view are not hugely important for camera operators it is very much important for VFX professionals such as matchmovers, compositors and 3D artists.

Slow motion can in certain circumstances cause problems for matchmovers. In order to achieve high framerates some camera systems have to window the sensor, effectively cropping it in order to increase the sensors read out performance resulting in a reduced FOV (field of view).

This means measurements given in the manufactures specifications are purely the sensor size rather than the imaging area used to capture a given format. The same thing can happen when selecting a different recording standard, for example DCI 2k might use more of the imaging area of the sensor than HD, meaning HD effectively has a narrower field of view.


Factors to consider when handling footage

Resolution

Image resolution defines the amount of detail in footage or a still image. Modern high end cine camera systems such as those from Red and Arri have resolutions 6K and beyond. However optics and sensor characteristics do play a part in the fidelity of the final recorded footage. Not all HD, 4K cameras are born equal, some use pixel binning and interpolation to arrive at a given resolution, whilst it’s not important to know how this works it is important to know that this can dramatically affect the overall quality.

In the example below I shot a scene in 4K and in 720p HD simultaneously, notice how fine details in the stone work are very visible in the 4K version whilst they have disappeared in the 720p HD footage.

How does this affect matchmoving?

With good quality footage, high resolution plates can be a joy to work with. Fine details in the scene that would have been otherwise completely lost in lower resolution formats suddenly become a rich array of trackable features. High resolution is not without it’s downside. Apart from the obvious increase in processing time, you’ve actually got to increase your feature sizes accordingly, because otherwise you end up with a very small feature windows with not much useful data inside them. We can see this in the example below the left hand image is the feature window from of the HD clip and the right is from UHD.

On the left Feature window of an HD clip, on the right UHD.

Ultimately, increasing resolution does not always lead to an increase in tracking accuracy. Soft or poorly calibrated optics can have a similar effect on your footage.

Dynamic Range

One area where there is a lot of variance is dynamic range. Dynamic range in it’s simplest terms is the range of light / brightness that a camera can see. Have you ever taken a photo with your mobile phone on a bright sunny day and wondered why the sky looks so bright and the clouds have disappeared? This is caused by a limitation in the sensors ability to reproduce the brightest and darkest parts of the scene at the same time.

Some sensors are better at reproducing a range of brightness than others. In the example below I have shot with the same exposure settings once with an HD cine camera and again with a mobile phone in HD video mode. Ignoring the lack of sharpness and depth of field differences for the moment we can see the phone footage has a complete lack of detail in the sky and the roof which are present in the cine cameras footage. Additionally all the detail in the foreground blinds are absent where they intersect with the sky.

The left image was exposed using a Sony F3 cinema camera, the right image using a mobile phone

This is because the cine camera sensor is able to capture ⅔ of the total brightness range in the scene where as the phone camera sensor is only able to capture a ¼ at best. Detail which is not captured by the sensor will rapidly clip to white in the highlights and crush to black in shadows. It’s important to note that incorrect handling of recorded footage can result in a loss of dynamic range also.

Let’s have a look at another example below. Notice the lack of trackable detail in the shadow portion of the image on the right.

How does this affect matchmoving?

Good contrast is important to matchmoving but not at the expense of detail. Put simply it’s the difference between a few trackable features and many trackable features. Whilst tracking a low dynamic range scene is far from impossible and potentially could still yield great results, having a feature rich high dynamic range scene can make your life a whole lot easier and get you closer to the results you desire quicker.

Rolling Shutter

There are many articles online which explain the causes of rolling shutter much better than I can, but we can see effects of rolling shutter for ourselves. Using a mobile phone set to video mode, point the camera towards a vertical surface like a door frame. Record with the phone held steady for a few seconds then gradually pan left and right with the phone slowly increasing the rate at which you pan. Playing back the footage you will notice that the door frame tilts as you increase the pan speed, rather than perfectly vertical as it should be. Below are some stills taken from the footage I recorded of a brick wall with my phone camera, demonstrating the issue.

Most cameras, especially consumer and semi professional cameras will suffer from rolling shutter sometimes to quite a severe level.

In simple terms it’s caused by the image being read of the sensor row by row and by the time it’s got to the bottom, the camera orientation has changed slightly. In effect the top of the image is a slightly different point in time to the bottom. High end cameras from Red and Arri do suffer from the effects of rolling shutter but reduce it dramatically by increasing the speed in which the image is read of the sensor.

How does this affect matchmoving?

Rolling shutter is movement where there should be no movement and this in turn will lead to false results when we matchmove the footage. Rolling shutter is a complex problem to fix, quite often foreground elements skew to a greater degree than the background. However advanced matchmoving software like The Pixel Farm’s PFTrack do offer a solution to correct or minimise this.


Image Noise

When taking a photo in a dimly lit environment using your camera phone, the pictures can look a bit noisy and lacking in fidelity, this is because the camera is gaining the signal by increasing the ISO in order to reach an adequate exposure level. Lower ISO values will generally mean lower noise levels whilst higher ISO’s increase the noise levels. High end cinema and stills cameras will perform a lot better in this regard than consumer grade camera systems. But they are not immune to excessive noise when using a high ISO, however they are normally able to reach a higher ISO before noise becomes a limiting factor. Underexposure of footage can have the same effect as high ISO revealing more of the noise floor when correcting the image back to it’s proper exposure level.

In the example below we can see a crop from the shot exposed firstly at 800 ISO then at 3200 ISO, notice how quickly fine details are obscured and are lacking in micro contrast as we increase through the range.

Clip exposed using a Canon C300

How does this affect matchmoving?

Noise can be a big problem during the matchmoving process especially if tracking footage from cameras with smaller sensors in less than adequate lighting conditions. Fine details are lost due to interpolation errors in de-bayering process, we can see this clearly in the 3200 iso sample above. Excessive noise can affect how tracking points are located (e.g. when auto-tracking) and how accurately they are tracked. However noise has to be very present in order for it to be a real problem when it comes to matchmoving.

Compression

Have you ever streamed your favourite series and then all of a sudden the internet connection drops and are left with a mess of blocks and squares making it difficult to even make out people’s faces? This is the result of compression.

A similar type of effect can happen during a shoot in situations where there is large amounts of camera movement and using a highly compressed codec to record the footage. Most cameras will usually offer an option of recording to a compressed codec to save space on memory cards when longer recording durations are required. POV (point of view) cameras frequently use highly compressed codecs for recording.

Modern high end broadcast codecs will deliver images almost indistinguishable from the uncompressed version by compressing the footage just enough so that it throws away information that we are not likely to need and maintains the bits that we do. However there are some gotchas, whilst the footage may look great when the camera is still, this “might” not be the case when it is moving.

In the example below of a handheld panning shot I recorded to a highly compressed AVCHD @28Mbps / 3.5MB/s codec. I simultaneously recorded uncompressed with the same camera as a comparison. Notice on the right how some of the fine details have completely disappeared with the compressed recording. Additionally edges have become unrefined and when viewed in motion appear to dance around and jitter.

How does this affect matchmoving?

Camera movement is everything in matchmoving and to give software the best chance of finding an accurate solution we will want to give it highest quality footage. Unfortunately camera movement or any kind of movement is the worst enemy of compression.

This will present itself as mosquito noise around fine detail and macroblocking around areas of movement as we have seen in the example above. Some video codecs group frames together, comparing each other only storing and interpolating information that has changed between frames and averaging any detail that hasn’t. Matchmoving with compressed footage is still possible and will still provide adequate results but can take a lot longer due to errors created from false detail caused by interpolation and compression artifacts. In any situation RAW data recording is always preferable to compression.


To find out about RAW recording can improve some of these issues why not read my article on working with RAW here.


Spherical 360 video

360 video consists of a real world video shot with a 360 degree camera that allows the viewer to change their viewing angle at any point during playback. These videos can be enhanced further with CG in the postproduction process in the same way we would a conventional 2D production but does require some specialist matchmoving software and toolsets like The Pixel Farm’s PFTrack.

VR 360 cameras commonly involve two or more cameras recording at least HD, the clips from each of the cameras are then stitched together either internally or in post to form a 360 degree spherical panorama that can be viewed in a desktop viewer or VR headset.

Ricoh Theta V a typical back to back 360 VR camera

The two main types of VR camera system commonly used. Back to back rigs are simply two optics and sensors in one housing or two separate cameras placed back to back with combined optics that cover 360 degrees. The benefits in these systems are low parallax, size, ease of use and small footprint making them perfect for situations where a larger 360 rig would not be practical. The downside is the somewhat limited resolution combined with the extreme nature of the optics can lead to aberrations and fairly soft results.

Multi cam rigs share many of the same principles as the back to back systems but add more cameras to ultimately achieve better quality results. These rigs can be made up of multiple cinema cameras or as a single housing with many integrated sensors and optics. The distinct advantage multi camera systems offer is due to there being a larger number of higher quality cameras, the optics don’t have to cover such an extreme angle of view, this makes them less subject to complex distortions, aberrations, flaring and softening towards the extreme edges. This is good news, clearer higher resolution images with greater dynamic range will always have the potential to provide better results during the matchmoving process.

Unique factors with 360 video

360 camera systems can run into the same issues as we discussed above but also have a couple of other issues unique to this acquisition format.

Parallax

Both Back to Back and multi camera systems share a common problem and that is parallax. This presents itself as errors of overlapping detail along the stitchline and are worsened with objects closer to the camera rig. In order to achieve a perfect stitchline all cameras must rotate around the entrance pupil of the optics. Unfortunately this would be physically impossible as all cameras would have to occupy the same space at the same time. We can see the effect of parallax in the frame below where the wall is close enough to the rig for parallax to be an issue. This presents itself as misregistered detail on the wall along the stitch line.

The effects of parallax can however be minimised by making sure the cameras are as close to the central axis plane as possible and the rig not too close to the subject you wish to track. This is achieved very successfully in systems where both Optics and sensors are built into the same unit. However image quality compromises have to be made in order to shrink the cameras and sensors enough to do this. Parallax errors can be a problem as it can cause camera registration errors and create accuracy problems when positioning tracking points in 3D space.

Camera Synchronisation

Camera synchronisation is also a big problem with some VR 360 camera rigs. During our testing we used a back to back VR system comprised of two separate cameras. Despite large amounts of experimentation we struggled to get sufficient synchronisation with both front and rear cameras. Whilst it was still possible to track the clip we could never get a perfect sync between the stitched clips due to slight variances in the sensor timing. This ultimately lead to errors in accuracy during the tracking process due to independent movement between cameras. In the example below we can see a 360 clip manually adjusted for correct sync on the left and the recorded incorrect sync seen along the stitch line on the right.

Larger single housing multi cam rigs and rigs made up of professional cinema cameras solve this using a locking signal and timecode to sync the clips together during recording, but even then they do on occasions still fall out of synchronisation.


Into The Future

This is definitely an exciting time in post production. In the last 10 years we have seen a switchover from film to digital as the prefered acquisition method for features and television dramas and has made visual effects a lot easier and more accessible.

Resolution, dynamic range and sensitivity of digital acquisition formats have all increased year on year to the point where they have surpassed the film that it replaced. All of these aspects help make the matchmoving process easier.

So where next..?

It’s not a huge leap to suggest that cameras will continue to get smaller, lighter and increase imaging performance and we are already seeing a push towards even large imaging formats such as 65mm. But where I think it will be most interesting for matchmoving is in the areas of metadata and RAW data capture…


Matchmoving doesn’t need expensive camera acquisition systems to work well. Just by being mindful of a cameras limitations, you can create amazing work.