EMU-ML: Data Capture Methodology

Errick Jackson
4 min readMay 8, 2023

Color Model

This project effectively amounts to a color space transformation. While the Python script’s ingest procedure measures RGB values from the footage and scans, RGB is not a very useful model to describe the perceptual experience of color. Conversely, a perceptual model is very useful for translating the characteristics of a subtractive physical representation to an additive digital capture. So for the purposes of this project, the CIELAB color model is being used as an intermediate for calculating the transformation function.

A) 3D representation of CIELAB || B) Cross-section view of CIELAB

CIELAB describes the visible color spectrum with perceptual uniformity and linearity described by axes L* (Lightness), a* (approximately Red to Green spectrum), and b* (approximately Blue to Yellow spectrum). As L* approaches 0 or 100, the maximum spread between a* and b* decreases. Their spread is at its absolute maximum at L=50. While the actual shape of the [a*,b*] 2D cross-section is horseshoe-shaped, for ease, one can broadly conceptualize the 3D shape of the CIELAB color model as a sphere, with its poles at L=0 and L=100. Because the model describes the entire visible color spectrum, it can perform as a device-independent color space, assuming a constant reference white point.

This is all important not just for the intermediate model, but also as a guide for the data capture methodology. The success of this entire process is dependent on data — the larger and better quality the dataset, the better the model can describe the behavior of both formats and connect the two in a robust manner. Additionally, because the calculation is being performed by a powerful neural network, the only thing holding me back from capturing as much data as I want is time and money.


This project uses a ColorChecker Digital SG 140-patch chart as our reference, but the light source is the key. A proper profile of the format needs to describe how the sensor or film behaves under different illuminants, including highly saturated sources and various brightnesses. This is what it means to use the CIELAB color model as a guide for capture. By capturing the chart lit by illuminants spread all throughout the 3D CIELAB space, we can get a vast amount of information describing the behavior of the formats from the middle to the extremes of the gamut.

The light currently in the planning spec is an RGBWW light that allows for the setting of RGB, XY, and HSI. HSI will likely function most intuitively since we can easily set evenly distributed hue vectors and saturation levels. The current plan is to capture frames at 30-degree hue intervals, 50% & 100% saturation per hue, and exposure sets at EV = [-5, -3, 0, 3, 5] for each HS combination; this is in addition to exposure sets at EV = [-5:5; 1 stop increments] for Daylight and Tungsten illuminants. This will result in 142 exposures with just under 20,000 points of data from each format, whose data will populate the vast amount of the CIELAB space. This will massively aid the accurate modeling of how different film stocks bend in response to various extreme saturation colors and luminance values.

Capture Devices

The final element of note in this project is the capture devices. At the time of this writing, the planned device for digital capture will be the Arri Alexa with Alev III sensor in Prores4444. The capture capability of this sensor is robust enough in color information and dynamic range to cleanly relay color and luminance data throughout the entire desired dataset range without compromise. The film stocks being profiled will be captured through a film camera on 35mm format stock. The rolls will be developed and scanned through a BlackMagic Design Cintel scanner in 2-pass HDR mode with either .dpx or .cri files as the scanned media for ingest. To ensure the most controlled data set possible, both cameras will use the same lens for capture. The current plan is to use the Zeiss Otus 55mm F1.4, heralded for its neutrality and sharpness. However, that may change when the project is underway.

This methodology should result in the most widespread, robust, and well-controlled data capture session we can manage for the experiment, within reason. It is almost certain that the film’s maximum dynamic range will exceed the range proposed by this project for testing, especially with negative film. That is why the ultimate goal is to provide so much data that the resultant model is able to easily extrapolate an accurate transformation for values not directly measured, both outside and in between the captured color information.