Designing an efficient, user-friendly image data structure

Published in

Zero Equals False

8 min readJul 4, 2019

Every single image editing program out there has some internal representation of the images it operates on. This can be incredibly simple — like for example a naked linear array — or rather more involved.

The trick in creating one of these structures lies in laying out the data such that it can be accessed with as little overhead as possible while still providing an expressive computational basis (the set of procedures that can be applied to a type, see chapter 1.4 of Elements of Programming by Alexander Stepanov and Paul McJones for more details).

Prior Art

People smarter than me have provided their solutions to this problem long before I was born. Here is a quick look at some I am familiar with.

It has to be said that I have not inspected the internals of these implementations, so what I am providing here is merely a guess as to what the intentions behind the design might have been.

OpenCV for Python

The newer versions of OpenCV bindings for Python provide an interface based on Numpy arrays. This makes for excellent usability in that Numpy arrays are familiar to most number crunching Python programmers and also in that they are insanely flexible.

Some of this flexibility comes from the malleability of Python and with that comes a hit on performance. I have not yet made extensive use of the native OpenCV package, but I imagine that a non-trivial (but still impressively short) amount of time is lost converting from one interface to another (I know, I know, imagining is very unscientific and some day I will do my homework and benchmark this).

OpenImageIO

The wonderful OpenImageIO library provides its own image type. For one as an output/input to its file I/O procedures, but also to run calculations on.

From reading the documentation, I would surmise that the primary motivation for its design was compatibility with a wide range of data types and layouts. Literally the first thing treated in the 400+ page document is TypeDesc — OIIO’s way of defining the semantics of the image data.

Nuke

The Foundry’s Nuke software is the behemoth of VFX packages in the motion picture industry. In the documentation of the program’s C++ API, they explain that their image type operates on lines. The idea being that they could optimise performance by skipping a number of lines when that resolution was not needed.

Cherry picking

All of these approaches contain interesting ideas:

Python OpenCV shines with an extremely expressive set of operations
OpenImageIO can handle a near-unlimited amount of different data layouts
Nuke has optimisation built into its data layout

To some (limited) extent, these benefits are mutually exclusive. Thus, we will hardly be able to create a structure that outclasses all of these contenders (besides, they have all been written by engineers with much more experience than myself). However, I think a new image type can still bring something interesting to the table: a C++ image type that aligns with the style set out by the STL¹, is extraordinarily expressive, accommodates a wide range of data layouts and makes speed a priority (that last one is arguably a must in any contender).

Measures of success

As hinted at before, these are my design goals (this time ordered by priority):

Speed
Expressiveness
Flexible data layout
STL-like interface

Number one on that list — speed — is by far the easiest to measure. If the same algorithm is applied to my image structure and any other, I’d like my own implementation to come out on top in most cases (being realistic, I would also be thrilled about a draw).

Expressiveness is more elusive a metric. Here, I’ll use my own judgement as to whether my structure allows performing a larger number of operations in a more readable fashion than its competitors.

The data layout’s flexibility shall be measured by the number of distinct data semantics that can be represented by an image type. For example 32-bit float data in RGB channel order would be one layout, the same 32-bit float data in BGR order would be another, 16-bit unsigned integer data in RGBA channel order another and so forth.

Finally, the STL-like-ness will once again be graded according to my own judgement and, eventually, that of the library’s users (unless I can get a committee member to look my code over — in which case their verdict trumps all ;-)).

The Interface

Taking all of this into consideration, I came up with the interface described in the documentation of this class. Let’s go over the different aspects one by one:

The data itself

It has become established practice to represent images as one-dimensional arrays. Here, I opted for std::vector as a convenience. Often this linear list organises the data such that one row follows the next. In the case of my structure, columns are contiguous and listed one after the other. Part of the reason why I opted for this strategy is because I wanted the optimal two-dimensional iteration strategy to be as intuitive as possible (x in the outer loop, y inside, matching the natural way of speaking “Iterate over X and Y”):

for (size_t x = 0; x < my_image.width()< ++x) {
    for (size_t y = 0; y < my_image.height()< ++y) {
        // do something with the pixel at (x, y)
    }
}

If the data were organised row-wise, this would cause a lot of cache-misses whereas adjacent pixels are accessed one after another in this column-wise layout.

I will elaborate on another benefit of this design in the section on the subscript operator.

To be able to access the data for linear operations, image::data returns that vector “in the nude”.

Data semantics

Since the image is effectively just a list of numbers at this point, a practical image structure needs to store information on what this data actually means. Obviously width and height fall into this category, but also the number of channels and, if the structure is to support multiple channel orders, their order.

For this particular class, width and height are simply represented by size_t member variables with read-only accessors. Since I want to support arbitrary channel semantics, this information is represented in the form of a std::vector containing the names of the image’s channels. It is then up to any channel-dependent operation to make sure it operates on the correct one.

In the future, I would like to have a mapping from a channel’s name to its offset within the pixel so that a particular channel’s data could be quite simply accessed with something like my_image(42, 47, my_image.channels.RED).

More convenient data access via the call operator

To avoid having to calculate the exact offset of a pixel’s data manually, there are two sets of overloads for the operator(): one taking three arguments and returning a single channel value and another returning a pixel_view (more on this in the following section).

Overloading the subscript operator and defining views into the data

Intuitively, programmers might expect an image structure to work like a two-dimensional array. Thus, image[42][47] would be assumed to return the 43rd pixel in the 48th line.

As a consequence of C++’s operator overloading rules, one cannot simply introduce a 2D subscript operator with a simple overload. To provide this functionality, the structure must overload the subscript operator to return another type that has its own overload of the operator[] which returns the desired data.

For my image type, I decided to introduce the concept of views. A view is a simple wrapper for a pointer into image::data that provides its own accessor methods (e.g. another subscript operator).

Thus, image::operator[] returns a column_view which in turn has its own operator[] returning a pixel_view which provides another operator[] to finally get to the actual data.

This is a big part of the reason data is laid out column-wise in the internal vector. I wanted the physical layout of the data to closely match the behaviour implied by the operator.

I designed these views to be a lean as possible, but since they have to be created and deleted, using the subscript notation can never be as fast as using image::operator().

Bounds-checked data access

In keeping with the STL’s conventions, image::at will throw an exception if the supplied coordinates are outside the image’s dimensions. Otherwise, this function behaves like image::operator().

Mathematical operators

Similar to OpenCV’s Python bindings, this image structure provides common mathematical operators. At the moment they are missing overloads for the primitive type the structure is base on, but where appropriate these will be added in the near future.

Comparison

For now, only operator== and operator!= are defined such that two images with the same width, height, channel semantics and pixel values compare as equal.

Inequality comparisons are missing since there is no intuitive order of images. One could say that the average intensity is a candidate for an order of images, but this feels meaningless to me and would also not be able to define a strict order (!(image_a < image_b) && !(image_b < image_a) would not imply image_a == image_b).

Opportunities for Generalisation

In the abstract, a pixel image is a three-dimensional collection of discrete values. The first two of the dimensions are the X and Y axes and the third is the dimension of colour. In the interface proposed above, we are effectively creating views into the data vector that represent a certain dimension “cut out” of the whole (the column_view). This view provides a way of accessing the lower-dimensional view representing a pixel (fittingly called a pixel_view) which simply provides an accessor for the single element (which represents a single channel’s value at one pixel coordinate).

This idea can be generalised to any discrete, multidimensional field of values by having the intermediate view’s subscript operator (the column_view that might in this more general case be called intermediate_dimension_view) return a lower-order intermediate_dimension_view. Finally, the second-to-lowest dimension’s view would then return the value at that particular coordinate. This could be achieved with template programming (e.g. by having a template argument to the view’s class that tracks its dimensionality and then switching between two overloads of the operator[] with std::enable_if).

Once that generalisation was implemented, creating the image type and other matrix types would be a simple matter of creating an alias or inheriting from this general class if functions are to be overridden.

Next Steps

First and foremost: implementing the structures described in this article. Once this is done, I would like to benchmark my implementation against some of the contenders described above. Lastly, I plan to refactor my code into the generalised version described above.

[1] In this article I use the abbreviation STL as a synonym for the C++ standard library even though it technically refers to Alexander Stepanov’s Standard Template Library (which the C++ standard library was based on but is not identical to).