# What is OpenCV’s INTER_AREA Actually Doing?

Recently I’m studying computer vision, and I came across the `resize`

function in OpenCV. Resizing an image needs a way to calculate pixel values for the new image from the original one. The five such interpolation methods provided with OpenCV are `INTER_NEAREST`

, `INTER_LINEAR`

, `INTER_AREA`

, `INTER_CUBIC`

, and `INTER_LANCZOS4`

.

Among those five methods, four of them are quite easy to guess how they do the interpolation. `INTER_NEAREST`

uses nearest neighbor interpolation; `INTER_LINEAR`

is bilinear; `INTER_CUBIC`

is a bicubic function; and `INTER_LANCZOS4`

is a sinusoidal method. However, `INTER_AREA`

is relatively mysterious, as the document of OpenCV describes it in the following way:

resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method.

What does it mean by “using pixel area relation”? Perhaps I’m a beginner, and this is those common knowledge in the field of computer vision that every insider just knows. So I googled the term, hoping to find some articles in Wikipedia, or blog posts, or maybe a paper. It turns out that most of the time when people mention those interpolation methods, they just rephrase the brief explanation above, or just directly copy it. And this includes answers in sites like StackOverflow.

In the end, I decided to read the source code myself. And it turns out that the “true area” method is a very intuitive one, and depending on some conditions, `INTER_LINEAR`

could be `INTER_AREA`

, and vise versa.

So I wrote this article to help others who may be also wondering what `INTER_AREA`

does. And I will explain it next.

Depending on how much we want to scale our original image, `INTER_AREA`

may do different things.

To help with explanations, we mention some variables used in the source code:

`inv_scale_x`

and `inv_scale_y`

are output image width over original image width, and output image height over original image height, respectively. `scale_x`

and `scale_y`

, however, are original image width over output image width, and original image height over output image height. In other words:

`double scale_x = 1./inv_scale_x, scale_y = 1./inv_scale_y;`

There are also two integer versions, `iscale_x`

and `iscale_y`

, which are `satuarate_cast<int>`

of `scale_x`

and `scale_y`

, respectively.

# Shrinking images

If we are shrinking the original image, i.e., the final image is smaller than the original in both width and height, then the algorithm checks if the size of the original image is a multiple of that of the final one. In other words, if the width and the height of the original image divided by the width and the height, respectively, of the final one is an integer.

If so, then a boolean variable `is_area_fast`

will be set to `true`

. Following this:

- If the interpolation method used is
`INTER_LINEAR_EXACT`

,we want to shrink the original image to its half size, i.e.,*and*`iscale_x == 2 && iscale_y == 2`

,the number of channels of the image is*and**not*2, then`resize`

function is actually using`INTER_AREA`

. - If the interpolation method used is
`INTER_LINEAR`

,we want to shrink the original image to its half size, i.e.,*and*`iscale_x == 2 && iscale_y == 2`

, then`resize`

function is actually using`INTER_AREA`

.

The “true area” method is only implemented for the cases where we are not enlarging the image. So the above two points also satisfy. This “true area” works in the following way.

First, we explain the case when `is_area_fast`

is `true`

. The variable is named so, because a family of functions utilising parallel computing will be called. We can image an RGB image as a cube. Figure 1 shows such a schematic illustration of a row of an image. Each small cube represents a pixel. The numbers are the indices.

A pointer to an array of ints `xofs`

points to an array that contains the starting indices of pixels on a row to be averaged in a later stage. For example, suppose we want to shrink the image to its third in its width and height. Then `iscale_x = 3`

, and `iscale_y = 3`

. What does `xofs`

contains? They are 0, 1, 2, 9, 10, 11, etc. Notice the pattern? We take all the indices in the channel direction every `iscale_x`

columns. It will be seen next that pixels with indices of 3, 4, 5, 6, 7, 8 will be combined with those of 0, 1, 2 to calculate a weighted average. So `xofs`

marks the starting boundaries of all such averaging blocks.

We then calculate a variable `area`

, which is equal to `iscale_x * iscale_y`

. In our example, `area = 9`

.

Another pointer to array of ints `ofs`

points to an array of `area`

number of indices. These indices are the offsets in *x* direction and *y* direction, such that they form a window for each channel, and the number of pixels in this window equals `area`

. In our example, `ofs`

points to an array of 9 elements. Suppose our original image has a width of 9 pixels. Then each row of the image has 27 pixels, taking account of 3 channels. Therefore, the first three indices of `ofs`

are 0, 3, 6. And then the next three are 27, 30, 33. Then finally the last three are 54, 57, 60. This is illustrated in Figure 2 (Of course, the width of the image in the figure is not 9…).

With this window, we sum the values of the pixels within it, and then divided by `area`

. The result is the pixel value of the output image. In other words, the algorithm simply calculates the average value of the boxed pixels. This averaging is done for each index in `xofs`

, finishing each channel.

Quite intuitive and straightforward, isn’t it?

That’s for scales of integer value. For non-integer shrinking, the idea is still the same, but weight will change depending on how much a pixel in the original image is included in a window (note that this time the window size has decimal values). For a row of pixels, this can be seen from Figure 3, where different colors represent which pixels are included each time the window moves. The same thing applies to *y* direction.

The weight of each pixel is the proportion that is included times `1/area`

. For example, if 0.3 of pixel is contained in the window, then its weight is 0.3/area.

We can check it with a simple example:

import numpy as np

import cv2# Create a numpy array, pretending it is our image

img = np.array([[3, 106, 107, 40, 148, 112, 254, 151],

[62, 173, 91, 93, 33, 111, 139, 25],

[99, 137, 80, 231, 101, 204, 74, 219],

[240, 173, 85, 14, 40, 230, 160, 152],

[230, 200, 177, 149, 173, 239, 103, 74],

[19, 50, 209, 82, 241, 103, 3, 87],

[252, 191, 55, 154, 171, 107, 6, 123],

[7, 101, 168, 85, 115, 103, 32, 11]],

dtype=np.uint8)# Resize the width and height in half

resized = cv2.resize(img, (img.shape[1]/2, img.shape[0]/2),

interpolation=cv2.INTER_AREA)print(resized)# Result:

# [[ 86 83 101 142]

# [162 103 144 151]

# [125 154 189 67]

# [138 116 124 43]]

# Enlarging images

When we are enlarging an image with `INTER_AREA`

, it is actually calling the bilinear function, but with a different way of calculating the interpolation coefficents.

For the bilinear method, assuming we consider a 1D image for simplicity, the interpolated pixel value is a weighted average of the two original neighbourhood pixels. The weights depend on the distances to the two pixels. Closeness means higher weight. And the weight values vary linearly with the distances. But before we continue, I’d like to talk about how the coefficients (weights) are actually calculated in OpenCV.

For a beginner like me, I would’ve thought that scaling a row like `[0, 1]`

to 4 columns would be `[0, 0.3, 0.6, 1]`

. However, this is not the behaviour of OpenCV’s `resize`

. Nor is MATLAB’s `imresize`

.

The code for calculating the coefficients in the *x* direction for the bilinear method in OpenCV is, for each pixel index `dx`

of the output image,

`fx = (float)((dx+0.5)*scale_x - 0.5); // (1)`

sx = cvFloor(fx); // (2)

fx -= sx; // (3)

Recall that `scale_x`

is the ratio between the input image width and the output image width. In (1), the `fx`

is the “floating-point” pixel index of the input image in the *x* direction, whereas in (3), the `fx`

is the interpolation coefficient for the right pixel in the *x* direction (i.e., the interpolation coefficients are the pair `1-fx`

and `fx`

). So where does the equation in (1) come from?

One can image a 1D image in the following way shown in Figure 4.

In Figure 4, we show the coordinate systems for a 1D image. We assume the pixels locate in the middle of pixel boxes, and boundaries have their own coordinates. Note that in MATLAB, index starts from 1.

Now what (1) does is to map the coordinates from the output image to the original input image. Since this is linear interpolation, to get the linear function, we need two equations. The first equation comes from the requirement that the left boundaries of the input and output images have the same coordinates. So that means -0.5 in the output image coordinate system should be mapped to -0.5 in the input image coordinate system. The second is that a distance of `inv_scale_x`

in the output image coordinate system should be 1 in the input image coordinate system. Solving for the linear function, we then have the expression of (1). The MATLAB code for this mapping is written as

`u = x/scale + 0.5 * (1 - 1/scale);`

which can be found in `images.internal.resize.contributions`

. Note that in the MATLAB code, `x`

is the output image coordinate, `u`

the input image coordinate and `scale`

is the ratio of output image width over input image width, the same as `inv_scale_x`

in OpenCV. So `[0, 1]`

would actually become `[0, 0.25, 0.75, 1]`

. Figure 5 shows the coefficients for the left pixel for 100 `dx`

s when the output image width is two times of the input image.

However, `INTER_AREA`

has a different strategy for calculating the weights. And the behaviour is also slightly different depending on whether the scale factor is an integer or not.

## If the scale factor is an integer

The code for calculating the interpolation coefficient (the weight) in *x* direction is, for each pixel position of the output image `dx`

:

`sx = cvFloor(dx*scale_x);`

fx = (float)((dx+1) - (sx+1)*inv_scale_x);

fx = fx <= 0 ? 0.f : fx - cvFloor(fx);

and

`cbuf[0] = 1.f - fx; `

cbuf[1] = fx;

Calculation in *y* direction is the same. Here, `cbuf`

contains the actual coefficients, where the first applies to the left pixel, and the second to the right pixel.

Since we are assuming `inv_scale_x`

is an integer, then `fx`

is essentially equivalent to

`int inv_scale_x_integer = saturate_cast<int>(inv_scale_x);`

fx = dx % inv_scale_x_integer + 1 - inv_scale_x_integer;

We can see that `fx`

cannot be larger than 1. In fact, the largest possible value is 0. Plus this statement:

`fx = fx <= 0 ? 0.f : fx - cvFloor(fx);`

This means, when `inv_scale_x`

is an integer, `fx`

is always 0, and the coefficients are always 1 and 0. This also means that the interpolated value is a copy of the left pixel. We can see this with a simple test:

img = np.array([[ 86 83 101 142]

[162 103 144 151]

[125 154 189 67]

[138 116 124 43]], dtype=np.uint8)enlarged = cv2.resize(img, (8, 8), interpolation=cv2.INTER_AREA)

print(enlarged)# Result:

#[[ 86 86 83 83 101 101 142 142]

# [ 86 86 83 83 101 101 142 142]

# [162 162 103 103 144 144 151 151]

# [162 162 103 103 144 144 151 151]

# [125 125 154 154 189 189 67 67]

# [125 125 154 154 189 189 67 67]

# [138 138 116 116 124 124 43 43]

# [138 138 116 116 124 124 43 43]]

## If the scale factor is not an integer

If `inv_scale_x`

or `inv_scale_y`

is not an integer, then the interpolation coefficients are no longer just 1 and 0. It is still similar to the integer case. It is just that the modulo is done with a real number. Although it is not well defined, we can still make a sense. Figure 6 shows the coefficients for the left pixel, calculated at 100 `dx`

positions, with a `inv_scale_x`

being 5.6:

Observe the pattern. Our scale factor is 5.6. Image that we are assigning values to each position, and at each position, the maximum amount we can assign is 1. Then from the beginning, we take 5 ones, and then there are only 5.6–5=0.6 to take, so the next coefficient is 0.6. After that, we have another 5.6 to divide. Last time we have 0.4 left to be filled up. We take that amount from the new 5.6, as if making the previous 0.6 to be 1. Now we keep assigning values so that at each position we have assigned the full amount, which is 1. Then after assigning another 5 ones, we have only 0.2 left. So the next one is 0.2.

# Summary

To sum up,

- When the output image is not larger than the input image both in width and height:

— The input/output scales in both width and height are integers:

- If width and height are shrinked by half, and the number of channels is not 2, then
`INTER_LINEAR_EXACT`

is`INTER_AREA`

; - If width and height are shrinked by half, then
`INTER_LINEAR`

is`INTER_AREA`

;

`INTER_AREA`

is the boxed/window resampling.

- When the output image is larger than the input image in either width or/and height:

— The output/input scales in both width and height are integers:

`INTER_AREA`

is a bilinear interpolation with coefficients (1, 0).

— Otherwise:

`INTER_AREA`

is a bilinear interpolation with slightly more complicated coefficient values.