The Fundamentals of Image Recognition

Jack McDonald
Analytics Vidhya
Published in
5 min readNov 26, 2019

What is this? These are just strange symbols placed in a particular order that hold no meaning if not digested correctly, yet your mind is continuously making sense of this collection of nonsense.

Cool. How?

Well, the most common method thought to facilitate this understanding is the periodic movement of the eyes from left to right, deciphering the meaning of each word by analyzing the order of the symbols and their subsequent pronunciation.

Strangely, but rationally enough, this is the basis for all efficient image recognition algorithms. Let’s get into it.

A Practical Example

Imagine that you are walking through an art museum. Additionally, imagine that you take five minutes to analyze the meaning, and objects in a given painting. The only catch: You must understand the painting to the above extent by only observing one square-inch of the canvas.

What would the result of this situation be? Well, it would not consist of much, as the observer would be unable to decipher any true meaning from simply analyzing the designated square-inch. If, however, 30 observers were given the same challenge, notified of the other observer’s analysis location, and the result of each square-inch’s analysis, would they collectively be able to decipher the meaning of the already mysterious piece of art? While the answer of the question is subjective, it would most commonly be ‘yes’. This model of pixel isolation, and subsequent analysis and identification through joint ‘effort’ pixel analysis is the approach many of the most popular algorithms currently take.

Abnormalities and similarities in pixel shade and position are taken as input to curate a value for that given pixel. Patterns in these values suggest an object or unifying characteristic in the image that can then be group as an independent entity.

Pixel Isolation and Analysis

The principal way algorithms detect objects in a given image is through the use of pixel isolation and subsequent analysis of the values given by that pixel. A computer cannot simply be given an image and expected to make sense of it through collective observation alone. That is similar to asking a baby to take a look at a block of text from The Odyssey, and make sense of it.

The theoretical approach cannot work, but a systematic one can. Similar to the natural human tendency to glance one’s eyes from left to right to understand words on the page, the computer can use a variety of methods to isolate a given set of pixelated dimensions and analyze them for pattern.

This technique allows for a greater depth of analysis for the image, and level precision for the software, as seen by the workflow below.

Although a larger ‘pixel grid’ is used to isolate specific parts of a photo in more conventional use cases, for our purposes, the above 7-by-7 grid is sufficient for explanation. The image on the far-left of the workflow model shows a 7-by-7 grid being overlaid onto the original image. Within each grid box, somewhat of an analysis takes place. In this case, the analyses yielded a ‘common consensus’ if you will, on the location of unique items on the image. Overlaying this with additional data, the algorithm is able to output, with precision, the approximate location on various object in the image, who’s locations had not been fully estimated in the initial image analysis.

Once locations of object in an image can be rendered, the image of the objects themselves can be referenced with a larger dataset of images to receive a clear, English labeling as to what the object is.

My Program

To test the convenience and accuracy of this image analysis method, I created my own program using many of the same factors.

Unlike the above model, my program was trained using the labeled NVIDIA Fashion dataset. The distinction ‘labeled’ means essentially what it sounds like, data that is given a predetermined value. 50,000 of these images were selected to be used for the training of the algorithm, while the remaining 10,000 were used for testing of the accuracy of the algorithm.

To arrive with the desired functionality of the algorithm, I implemented a three sequential layer in my network with three distinct activation functions: the first having the ‘flatten’ function, the second with ‘ReLU’, and the third with ‘softmax’.

The first layer’s function is to format the image such that the pixelated dimensions of the image are ready for subsequent analysis and identification from the following three layers.

While this exact structure is not consistent through all programs, the altering of image format is vital in creating an image recognition program with any minor degree of functionality.

What does this mean?

The exact repercussions of advancement in this domain are still somewhat unknown. While the potential level for positive impact is quite high, the level of the negative impact seems to be at a drastically darker level then initially intended.

We have seen, and will continue to see an increased level of hyper-supervised government rule if we allow for image recognition technology to increase at its current rate. In China, the technology is being used to identify and target religious minorities for deportation and brash treatment. Protestors in the infamous Hong Kong protests are being identified, and statutorily targeted through the use of facial recognition posts that have found a convenient home in the administrative region. This activity is a perfect example of policy control gone wrong. If the world wants to experience the positive effects of this technology at even a minor degree, measures must be taken to prevent against its extreme manipulation.

From a conventional use standpoint, the effects of algorithmic advancement are tremendous. Imagine world in which doctors would no longer need to spend hours analyzing the scans of patients looking for abnormalities, but could rather delegate that task to a narrowly designed image recognition tool that could understand the root of this medical outlier, and curate a diagnosis in response to it. I believe with the proper policy in place, this technology, partnered with more generalize AI applications, has the ability to change the world at a grand scale.

Contact:

Email: mmcd.jack@gmail.com

LinkedIn: https://www.linkedin.com/in/jack-mcdonald-a960ab194/

Twitter: https://twitter.com/jackmmcd123

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Jack McDonald
Jack McDonald

Written by Jack McDonald

Builder exploring the impact of digital identity and credentials in the next age of the internet.

No responses yet