How do computers see?

It is important to have a basic understanding how computers see. This will support for instance on creating or curating your own datasets for creating models using computer vision. Even though there are some similarities between humans and computers, they do not see the same way.

Let me do a small experiment. Consider the image below.

What can you see? Chances are you cannot what it is.

Let’s add colors!

Now, if you go back to the image black and white, most likely you now can see the snake. This is because I have given you the answer, and your brain automatically learnt. See for another example: Hopfield Network with four patterns. Neural systems, as we are about to learn to build, works in similar pattern.

Let me tell you one story. As I was building a model for identifying birds close to me, I found one that I have never seen before. I took several photos from a distance. Slowly, I have started to get close, and taking photos. Even my camera found something adding a square on the object. As I got closer, the object would no move. Birds move as you get close. It was leaf!

I have to get closer (which is image cropping in computer vision, or closer photo). I have to consider addition information, which is birds move, they sing. All this information, which are features, made me arrive to a conclusion: it is not a bird, it is a leaf.

I have seen this seen more than one time. A young person, a child, calls a dove a chicken, and the adult finds it bizarre, funny. How do you know a dove is not chicken? Technically, they are alike. You know since you have addition features: size, sounds, fly or not, and how it walks. Those are all features extracted from the image you see. This is how computer vision model works, more or less the same as humans.

Humans make an object segmentation, and identification. It is done automatically. We also make mistakes, like a did with the leaf. If an object is not clear, we find patterns we already know, like the snake in black and white after you know what the image is. Similar to hallucinations we seem on MobileNet, or INaturalist model, we humans also can hallucinate. As I was testing MobileNet for being used on my latest paper, it has mistaken a snake on water as a boat. INaturalist has mistaken the same image as a duck.

Why does that happen?

It happens because computer vision models is sensitive to background. I have seen this happening several times. You can try to train you model with images with no background, and it is a bad idea. Thus, as you train the model, make sure to expose the model to a rich set of possible backgrounds. This includes also a rich set of possible variations of the object. In one image, the object may be alone, on another with other objects, similar or not. Expose the model to different scenarios. It will make your model more general, even though it will make you validation curve harder to converge. It is a tradeoff: get a worse validation curve, but get a model with more diversity. There are tips out there to handle the validation curve, they did not work for me. Most of them.

--

--