Going the Distance
The goal of this project is to be able to create *good* representation of an image using only N triangles.
What is Good?
Defining what *good* and *good enough* means is important.
If we don’t know what a good image is
- We can’t tell the difference between a bad representation and a good representation. We wouldn’t be able to search in the correct direction, or search at all.
If we don’t know what good enough is
- We have to enumerate every possible representation and take the best one. While possible, this is intractable for almost all image sizes and number of triangles.
Good is also relative. After all, a list of triangles that is a good representation of an image of a fruit bowl is not a good representation of an image of the Eiffel Tower.
So the first step is to convert our list of triangles into an image through a process called *rasterization.* How I do this in my project will be the subject of a future blog post. Now we have two images, the *original* and an image created from our representation of it.
Distance
Before we talk about the ’goodness’ of an image, lets talk about the distance between two images. A good distance function should take two images as inputs, and output a number such that the less similar the images, the higher the number. If the images are identical, the distance should be 0 as there is no way to make them ‘closer’.
An image can be thought of as a 2-dimensional grid, with each ‘square’ on the grid representing a pixel. Each pixel can be thought of as 3 colors, Red, Green and Blue that are mixed in different proportions to create the appearance of a single color to your brain.
For example, here is a simple 3 pixel by 3 pixel image of a green and red checkerboard pattern.


There are many different ways to represent the color of pixel, however in this project I will be using RGB (Red, Green, Blue). Each color channel can take on a value from 0 to 255 for a total of $256³ = 16,777,216$ possible colors per pixel.
Simple Distance Function
A simple way to compute the distance between two images is to compute the sum of the distances between each pixel. Define pixel distance as the sum of the absolute value of the differences of two pixels’ color channels.

Using these formulas we can calculate the distance between these two images.

To start we compare each corresponding pixel of the images. We’ll go left-right, top-bottom and start with the upper left pixel.

The distance between pixels 1:

The distance between pixels 2:

Since all the other pixels are comparisons between the same colors, the final distance looks like this:


Mean Squared Error Distance
The simple distance we used above is a good start but we can make an improvement by switching our **pixelDistance** function from using the sum of the differences to using the sum of the differences squared.

The distance between pixels 1:

The distance between pixels 2:

Why is this better?
First, it’s important to note that although it appears the distance has increased from 160 to 14,020, that you cannot compare distances derived from different distance functions.
Second, take a look at the color swatches below. You’ll notice that although both swatches on the right differ from the base color by 48, the center swatch distributes it evenly amongst all 3 channels, while the far right swatch has all the difference in the red channel.

The simple pixel distance function would assign both pixels a distance of 48. The mean squared error pixel distance function would assign the center swatch a distance of 768 and the right swatch a distance of 2,304. This is an improvement because to people, if the difference is spread out amongst all three channels it appears to be the same image, only lightened or darkened. If the distance is all in one channel, the hue shifts and the colors are all off.
Here’s our red panda friend again. In the first image, every pixel had its red channel lowered by 75. In the second, every pixel had all channels lowered by 25.


Colors and Shapes
There is still a weakness with our image distance functions, and that is it only takes into account the distance between colors, and not the distance between the overall form of the image.
To illustrate, the below image is the red panda image where every color channel has been darkened by 35.

Where as this plain brown image, where every pixel has been set to the average RGB color of the original, scores about 8% better.

However, I hope you would agree with me when I assert that the first image is more similar to the original than latter. The latter could be almost anything, where as the first just needs to be lightened a bit to be an exact match.
Grayscale
One attempt would be to convert the image to grayscale, removing color data and leaving only brightness.
The 3 color channels can be converted into a single brightness value using the following formula.

The reason each channel isn’t represented in equal proportion is that the human eye doesn’t see each color as equally bright.
However, our darkened image still loses out to the average color.


In fact, the distance has widened in the average color’s favor, beating the darkened grayscale by 13%.
This result makes sense if you remember that the lightness of each square is directly derived from the 3 color channels. If the channels, were on average farther away, so will their brightness.
Compression
In a similar vein, we could try compressing the image. In this case we will transform the image into an 8x8 square of pixels. Each pixel in the 8x8 image is the average color of the pixels it subsumed.



Unintuitively, but unsurprisingly, this also fails to rank the darkened image as closer to the original.
Hamming Distance and Image Hashes
Even though the two approaches above fail to properly consider the form of the image there is a way to combine them to produce an approach that consistently ranks the darkened image as closer to the original than the average color image.
Image Hash Algorithm
1. Compress each image into an 8x8 pixel image.
2. Convert each 8x8 image into grayscale.
3. Calculate the average brightness of each image.
4. For each pixel in the images, if the pixel is darker than average set it to white, if it is brighter than average, set it to black.



The 64 pixels of the third images can be thought of as a string of 1’s and 0’s where each black pixel represents a 1 and each white pixel represents a 0. Going left-right, top-down we can construct a string that represents each image. The original image’s hash has been converted below as an example.

Then we can compare each string of 1’s and 0’s to the original, and count each the number of times they differ. The number of places two strings differ is known as the Hamming Distance.
The Hamming Distance between the original and darkened image is 3 while the Hamming Distance between the original and average color is 24 or 40 depending if you say pixels equal to the average are black or white. In either case, it’s clear that this approach suggests that the darkened image is closer to the original than a flat brown.
I picked 8x8 pixels because it 64 bits fits nicely in my blog’s column width, and 64 bits fits nicely into most computer architecture’s word size. If you need a more fine grained approach you can always bump up the compressed images resolution. For instance, here is a 1,024 bit hash (32x32) representation of our red panda friend.

Discrete Cosine Transformation
I have also considered another approach, similar to the one above, but instead of using the mean brightness, I would compute the Discrete Cosine Transformation. DCT translates an image into a matrix of coefficients for a series of frequencies. Similar images will have similar coefficients in their DCT matrix.
DCT Image Hash Algorithm
1. Compress each image into an 32x32 pixel image.
2. Convert each 32x32 image into grayscale.
3. Compute the Discrete Cosine Transform for each 32x32 image. Round very small coefficients to 0.
4. Save the first 64 coefficients (top-left 8x8 sub-matrix) of each image. These represent the lowest frequencies and are the most important to the image’s structure.
5. Calculate the average coefficient of the 8x8 matrix, but don’t include the first term as it contributes no detail to the image.
6. Then compare each coefficient in the DCT Matrix against the mean coefficient value and assign it a 1 if it was above the mean and a 0 if it was below.

Again you can see that this approach fairs quite well. The darkened image’s hash appears quite similar to the original’s, and the average color’s hash appears quite different. The darkened panda’s image has a Hamming Distance of 6 from the original, and the average color’s Hamming distance is 32. I will be comparing both approaches over multiple images during real trials to get a handle on which one is most effective.
Computing DCT

Fitness
There is a fundamental issue with using distance as the sole measure of a representation’s “goodness.” Although all images have a minimum distance of 0, their maximum distance depends on their size and coloring. A distance of 123,456 might be relatively close for a large high contrast image, but relatively far for a small image. We need a way to compare the closeness of representations among images of different sizes and palettes.
Enter the fitness function. The fitness function normalizes the distance to a value between 0 and 1. This is important because the program will be used on images of many different dimensions and colors and we need a way to rate the success of the output that is consistent among all of them.
Here’s my current fitness function.

Max distance represents the furthest image (according to the mean squared error distance formula) possible from the original. For our red panda friend, that’s this frightful image below.

The distance is almost 10x greater than that of the average color or darkened images. To create this image, every color channel was set to the furthest possible value from the source. So therefore every color channel was set to either 0 or 255, making this image essentially 8 color.
The left term divides the difference between the representation and the worst image. If the distance is 0, then the max distance term divides itself giving 1, which is the highest possible fitness.
The right term represents the Hamming Distance between the two image hashes, whether using the mean brightness method, or the discrete cosine transformation method. If the hashes are identical the highest possible fitness is assigned.
Both terms are multiplied by one-half to give them equal weight.
Ideas I have planned for the future are:
1. Determine the best weights for each term.
2. Determine whether the brightness mean hash or the DCT mean coefficient hash is more effective.
3. Setting max distance to the distance from a flat background color (like white) and if this would ever make the term negative, set it to 0 instead.
Good Enough
So now that we have a way to judge the goodness, or *fitness* of an image’s representation, how do we know when the fitness is *good enough?*
Honestly it’s a subjective/arbitrary threshold. For this project, I will qualify ‘good enough’, the minimum acceptable fitness, as 0.985 or 98.5%
If you remember the red pandas from the previous blog post you may have noticed a small black box with white numbers in the lower left hand corner. That number was the representation’s fitness score. The third red panda, that used 500 triangles had a fitness of just over 98.5% using only mean squared error distance function as a measure of fitness. I feel that that representation is sufficiently detailed that you would recognize it instantly as the original image’s representation even among other similar images.
Prototype Fitness Function

This fitness function is less strict than the one we derived above. So if 98.5% fitness was sufficient then, it should continue to be *good enough* now.
Final Thoughts
There is still room for improvement for this fitness function. For instance, it scores this image very poorly, despite it looking a great deal closer to the red panda image than the average color.

This is not helped by taking the image’s hash. Take a look at the image hashes produced below (32x32).

Although they appear to be very similar, their bits are inverse! This gives the max distance image a Hamming Distance of **59** which is very large despite their similarities in form. I only have two ideas on how to improve this.
1. Take the minimum of Hamming Distance and (64-Hamming Distance). The idea being that if almost all your bits are different, then your arrangement must be very similar.
2. Run an edge detection algorithm and compare the edges as a third term in the fitness function.
That’s all! Thanks for reading and if you want to read formulas rendered beautifully by MathJax instead of screenshotted and inserted try my main blog at: triangles.ghost.io