How does Computer see the world?
Now you’re probably wondering, how does a computer see the world? Since computer can only read bits (0s and 1s), an image is full of color, objects, and gradients!
Simply put, the first thing to do on every Computer Vision task is to interpret what it sees into bits — either it is an image or a video. This works by converting any image they see (whether it is taken from a video, or from capturing an image from a camera) from a 2-Dimensional image into 3D arrays of numbers, with each array containing the information of the image, for example, numbers representing the strength of each primary color (RGB).
But apparently, in the Computer Vision task, each and every image consumes a lot of storage, time — and compute power! To be able to train a reliable Computer Vision model, researchers usually sift through thousands to hundred thousand to millions of images every training. If each image consists of 3 arrays of colors, the computing power needed is absurd for your average researcher. So, how do we solve this?
Preprocessing in Computer Vision
One way to solve the huge requirements to create a Computer Vision model is to preprocess the image, which is to compress the information contained in an image while preserving as much as possible. To do this, there are some ways that can be used:
- Gray scaling
One of the easiest methods to preprocess the image is basically remove the colors from the image, ask yourself, what is the object on the image above? You can still see the building, right? This means that your model will be trained to 1 array of numbers instead of 3 with the number ranging from 0 to 255, 0 for completely dark, and 255 for completely light.
2. Resizing
A 1024x1024 image consists of 1024 rows and 1024 columns of pixels, and that is only for one array for 1 image. Computer Vision involves a lot of math, and even with computer speed in calculation, it might take real-time hours or days to compute thousands to millions of images. But what if we managed to resize the image to be smaller? For example, preserving every 4x4 pixels into just one? That would reduce the size of each image by almost 75% (well, maybe, depends on how you resize it).
3. Normalization
Some other way would be to standardize the value in an image. An image usually has a scale of 0–255, with 0 being completely dark, and 255 being completely light. Rescaling the values to a scale of 0–1 would make the calculation much easier right? Which calculation is easier, 255 times 241 or 1 times 0.98?
Summary
In this brief article, we talked some of the most used techniques in Preprocessing images on Computer Vision, there are a lot more techniques out there, such as segmentation, sigmoid stretching, etc., hopefully this article will help you enter the world of Computer Vision!