A Beginner Guide on Image Preprocessing in Computer Vision

Published in

Bina Nusantara IT Division

4 min readJan 5, 2023

Computer Vision is used very frequently, many applications in the real world uses it, whether for object recognition, OCR in text, criminal profiling, smart cars, and many more.

How does Computer see the world?

Now you’re probably wondering, how does a computer see the world? Since computer can only read bits (0s and 1s), an image is full of color, objects, and gradients!

Simply put, the first thing to do on every Computer Vision task is to interpret what it sees into bits — either it is an image or a video. This works by converting any image they see (whether it is taken from a video, or from capturing an image from a camera) from a 2-Dimensional image into 3D arrays of numbers, with each array containing the information of the image, for example, numbers representing the strength of each primary color (RGB).

Image taken from InData Labs by Katrine Spirina, Anastasiya Zharovskikh

But apparently, in the Computer Vision task, each and every image consumes a lot of storage, time — and compute power! To be able to train a reliable Computer Vision model, researchers usually sift through thousands to hundred thousand to millions of images every training. If each image consists of 3 arrays of colors, the computing power needed is absurd for your average researcher. So, how do we solve this?

Preprocessing in Computer Vision

One way to solve the huge requirements to create a Computer Vision model is to preprocess the image, which is to compress the information contained in an image while preserving as much as possible. To do this, there are some ways that can be used:

Gray scaling

One of the easiest methods to preprocess the image is basically remove the colors from the image, ask yourself, what is the object on the image above? You can still see the building, right? This means that your model will be trained to 1 array of numbers instead of 3 with the number ranging from 0 to 255, 0 for completely dark, and 255 for completely light.

2. Resizing

A 1024x1024 image consists of 1024 rows and 1024 columns of pixels, and that is only for one array for 1 image. Computer Vision involves a lot of math, and even with computer speed in calculation, it might take real-time hours or days to compute thousands to millions of images. But what if we managed to resize the image to be smaller? For example, preserving every 4x4 pixels into just one? That would reduce the size of each image by almost 75% (well, maybe, depends on how you resize it).

3. Normalization

Some other way would be to standardize the value in an image. An image usually has a scale of 0–255, with 0 being completely dark, and 255 being completely light. Rescaling the values to a scale of 0–1 would make the calculation much easier right? Which calculation is easier, 255 times 241 or 1 times 0.98?

Summary

In this brief article, we talked some of the most used techniques in Preprocessing images on Computer Vision, there are a lot more techniques out there, such as segmentation, sigmoid stretching, etc., hopefully this article will help you enter the world of Computer Vision!

References

How Does Computer Vision Work — InData Labs

Getting Started with Image Preprocessing in Python | Engineering Education (EngEd) Program | Section

What is Image Pre-processing Tool and how its work? (mygreatlearning.com)

Computer vision — Wikipedia

A Beginner Guide on Image Preprocessing in Computer Vision

How does Computer see the world?

Preprocessing in Computer Vision

Summary

References

Written by Limas Jaya Akeh