Fast Averaging of High Color (16 bit) Pixels

5 min readMar 13, 2018

How did computers in the 90s manage to quickly process color channels that aren’t powers of two? To find out, I set out to write code to quickly average High Color pixels.

(variously spelled **Highcolor**, **Hicolor**, **Hi-color**, or **Thousands of colors** on a Macintosh)

Following my previous post about Arm Neon Intrinsics Add Functions, Ryan from Intel responded with a little challenge:

Write code to calculate the average of two pixels in RGB format with 16-bit, in which R, G, B takes 5, 6, 5 bits. You have to calculate them together without extracting each channel and get average separately.

Challenge accepted!

I was immediately intrigued and after a bit of research, found out that this format is referred to as Hi-Color graphics, which was common in Super VGA video cards in the 90s. Quick side note, adding 1 bit to green makes a lot of sense, since it covers the widest range in the human perception of color.

Ref: http://hyperphysics.phy-astr.gsu.edu/hbase/vision/colcon.html

The reason you don’t want to extract each color is that important performance gains can be achieved by operating directly on the 16-bit pixels. A Single 32-bit Instruction can operate over Multiple Data values (in this case 2), this is referred to has SIMD. Now there’s a catch, with great speed up comes great complexity, as the PC Graphics handbook describes:

the fact that the individual colors are not located at a byte boundary introduces some programming complications. To handle this uneven mapping code must perform bitwise operations to set the corresponding fields to the required values
— PC Graphics handbook

Now that’s my kind of a puzzle!

First things first, the fastest way to compute an average between a and b is:

avg = (a + b) >> 1

For those not familiar, shifting to the right by one (>>1) is equivalent to a division by 2 floored. Note: We need to add 1 to a and b to offset the sum so that we end up with the same result as a rounded division by 2, but let’s not worry about that for now.

A bit of theory about the addition

Let’s get the obvious out of the way, computing the average on the 16-bit value directly does not work.

To understand why, we need talk about how computers add (in theory at least). The following animation is a half-adder, it illustrates that an addition is composed of two parts: the Sum which is computed using an eXclusive OR, and the Carry which is the result of an AND operation.

By Marble machine — Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=48017305

Notice that adding two n-bit(s) values requires n + 1 bits. Since we want to add 3 distinct colors, we need 3 extra bits, one after each color.

   0 R R R R R 0 G G G G G G 0 B B B B B
+  0 R R R R R 0 G G G G G G 0 B B B B B ----------------------------------------
   0 R R R R R 0 G G G G G G 0 B B B B B

Failing to do so will cause the carries to collide into the next color. Simply put, to perform the addition required for the average we need 19 bits and we only have 16.

The average fits in the same number of bits, so there’s got to be a way!

Let’s go back to our half-adder and try to express it in C:

(a xor b) + ((a and b) << 1)

The more picky amongst you might be shocked that I use a + operator inside my definition of the addition, but I don’t mind. Keep your eyes on that left shift, I’m going to make it disappear.

Let’s replace the addition in the average operation with our C half-adder:

avg = (a + b) >> 1avg = ((a xor b) + ((a and b) << 1)) >> 1

Next, we distribute the right shift:

avg = ((a xor b) >> 1) + (a and b)

and the left shift has disapeared! Great, we don’t need the extra bit anymore, but we still need to take care of the possible collision between colors.

We’re almost there, you’ve read this far in so why stop now?

Let’s focus on the ((a xor b) >> 1), notice that when we perform a right shift, we throw out the rightmost bit. So let’s zero it out:

     R R R R R  G G G G G G  B B B B B
and  1 1 1 1 0  1 1 1 1 1 0  1 1 1 1 0
---------------------------------------
     R R R R 0  G G G G G 0  B B B B 0

A simple way to express the value of this mask is via hexadecimal. All we need to do, is compute the value of every four bits:

          Mask: 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0                      
               +-------+-------+-------+-------+
     Hex Value:    F       7       D       E

If there’s a carry bit between colors it will go in the first zero, so it won’t collide with the next color and it will get right shifted back into its original color:

         R R R R 0  G G G G G 0  B B B B 0
(>> 1)   0 R R R R  0 G G G G G  0 B B B B

Let’s put it all together

Here you have it, code that calculates the average of two Hi-Color pixels without extracting each channel.

This is the most commonly used version, but if you are like me, truncating instead of rounding just does not feel right. So Let’s go the extra mile and add in the rounding.

Extra credit: Now, about that +1

As previously mentionned, the correct code to compute the average is in fact:

avg = (a + b + 1) >> 1

We need to add one to every color in the pixel

+ 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1

Now here’s the interesting part, the +1 will also get shifted out by the right shift. Hang on! It’s not that easy, we can’t just ignore it, because the carry of the +1 could have an impact on the second bit.

If the first bit of a xor b is 1, then the +1 would carry into the seconde bit. As it turns out there’s a mask for that:

     Mask: 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1                      
          +-------+-------+-------+-------+
Hex Value:    0       8       2       1

This version will set you back an extra add and an extra and per pixel, but you get the proper rounding.

In conclusion, I streamlined this post as much as I could (hopefully not too much) and its still roughly 1000 words to explain two lines of code.