Undefined Category Classification
Written by Joshua Clement, Machine Learning Engineer at Skalar
Photos by Andreas Wiig
I have always been fascinated by the image recognition subsection of machine learning. Image recognition, or more correctly image classification , is the process of taking an image and outputting a class. The accuracy at which a trained program can accomplish this task intrigues me. Just think of what it takes to recognize objects in the real world. Somehow the computer is able to identify objects from an image, represented as a list of numbers, by recognising the different shapes, textures and patterns in the changes of values.
In order to do this, each image is assigned a number associated with the desired label, and weights associated with the pixel values are changed to match that assigned number. Thus the computer gives value to certain combinations of pixels to ‘see’ the contents of the image. (Notice how the computer doesn’t actually learn what the label is, but rather it learns the correlation between pixels in images and a value assigned to the label.) This requires a predefined (unaltered) list of classes. But my question is — would it be possible to create an algorithm which classifies images but doesn’t require pre assigned classes? And if so, what would that look like? The answer to the latter came to me when I stumbled across the idea of grouping images based on their exposure. The intended result would then be a design that would work in a variety of situations.
The dataset for this contains images with different exposures, and its goal is to detect which images are taken at the same location. Upon brief visual inspection, I found difficulty in matching images because my eyes were more interested in the color changes. Of course the location of windows, wall edges, and outlines of furniture helped me. If I could write a program that would find those kinds of outlines, then I would expect two pictures taken in the same location, but with different exposure, to have the same exact outlines.
The method of finding these outlines is called edge detection. In theory the algorithm looks (both horizontally and vertically) at where there are significant changes in a grayscale image.
There are libraries to help with this and even detailed explanations of edge detection, including . After finding the edges to all the images, this algorithm can calculate the difference pixel-wise, or sum of the differences, between each image. We could safely assume images taken in the same place should have the same edges and, therefore, the difference should be minimal if not zero. Even in practice, many correct pairs can be found this way. The reality showed that even though the sum of differences wasn’t close to zero, the gaps between the difference of matched images and the difference between unmatched images was significant. In other words, when comparing one image with all others, the correct matched pair was significantly closer to zero, and could usually be identified.
Unfortunately this naive solution has one main problem — edges are harder to find on darker images. This leads to darker images taken at different locations being more similar than their correctly matched counterparts. I reasoned that this problem came because I was summing the differences of all the pixels, which manifested a great deal more in darker images where edge detection may not have been as precise.
My next approach was to use a clustering algorithm. The basic approach with clustering algorithms is to view the data as points on a graph, and group them into categories. In my case, we create a graph with the number of dimensions equal to the number of pixels in our edge detection image. This way, when finding images that are “close” to each other, each individual pixel is important. The individual algorithms are unique and offer different options. Most of them try to divide the data up into a set number of categories, which again wouldn’t solve my original problem. I could have gotten around this by sequentially increasing the number of categories and then comparing their rating (a more detailed analysis of K-means clustering algorithm and this rating system can be found ), but I opted to use the hierarchical clustering algorithm. Instead of grouping data points, this algorithm creates a hierarchy of all points, with the intention of letting the programmer decide where the correct cutoff is.
This approach did incredibly well when the group size is two, and also surprisingly well even with three. Unfortunately, it has some drawbacks. Essentially it is impossible to draw a line on the graph dividing the connections that are valid (those that belong to the same group) and those that are irrelevant. This means if the group size is not two, it is difficult to be certain of the group cutoff. How can I tell whether the hierarchy given shows the connection between two different groups, or two sections within a group?
At this point, some human intervention is required. A decision must be made, (which hopefully remains correct with all trials) to determine the cutoff between images that are unique, and multiple pairing that should be grouped together.
Next iteration, I will attempt to substitute the edge detection algorithm with a Concurrent Neural Network and train it with a hierarchical clustering algorithm to measure accuracy.
Originally published at https://www.skalar.no on November 29, 2021.