Image Processing Fundamentals

Harshit Jain
Analytics Vidhya
Published in
6 min readJul 6, 2020

Wonders of Image processing have no limits, things have gone so far from application with Agriculture to Cosmos with amalgamation of Machine learning. Watch this amazing video from Katie Bouman

HADO Worldcup

Yes, those are real people playing with AR, this is futuristic, ain’t that?
Here is the video for above sport HADO,

HADO — AR × Sports

Image processing have gone too far, even though it’s at it’s earliest stages with AI, GAN(Generative Adversial Network) has lot more to do.
Let’s begin the journey with the very basics of it. Here we go.

The image processing is basically manipulating the values of an image(arranged pixels), or a video(which is a collection of images) by using a computer such that the resulting image is more useful to us, or it has some desirable properties. For eg. removal of blur, and feature extraction.

blurred vs processed

Not just removing of blur but a lot more is done with image processing and to learn all that you’ll have to hang on with me, it can be overwhelming to learn all new terms at first, most of the concepts may seem to you like why the hell am I reading this, but you never know even a shunned concept of image processing can bring phenomenal results. There’s a camera which can see through solids, well that’s not the magic of camera, but the logic of image processing made it possible. That’s possible merely just because of a filter on Oneplus 8 pro.

shot with Oneplus8 pro

The basics you need to start comes from Analog and digital signals. As computers can only understand digital signals(0s and 1s) and the images we capture have actually an analog source(i.e. most image sensors), therefore we need an analog-to-digital converter to convert it into digital form.

An analog-to-digital conversion can be done in two steps:
1. sampling
2.
quantization.

Analog to digital conversion

Sampling converts a time-varying signal into a discrete-time signal, a sequence of real numbers.
Quantization replaces each real number with an approximation from a finite set of discrete values.
The content above is general definition from wikipedia, but I have modified it a bit for context of image processing.

Since the image is continuous on both the axis:
on x(coordinates or the pixels) and on y(amplitude of pixels)
Digitizing the the coordinates here is sampling
and digitizing the amplitude is quantization.

sampling example

Here we are digitizing the x axis i.e. sampling. It is further divided into two parts , up sampling and down sampling.

In the above eg. we have down sampled the image to a factor 8 both horizontally and vertically i.e. size changes from of 2⁸ to 2⁵ pixels, here we can assume that we take 1 out of 64 pixels in a block of 8x8 block pixels of image.

8x8 down-sampled image
zoomed(down-sampled) image with up-sampling with zero order hold

The image after down sampling lost the content, and therefore is now just 8x8 image but to zoom it we have used up-sampling with zero order hold.

The zero-order hold (ZOH) is a mathematical model of the practical signal reconstruction done by a conventional digital-to-analog converter (DAC). That is, it describes the effect of converting a discrete-time signal to a continuous-time signal by holding each sample value for one sample interval.

— Source Wikipedia.

what it is simply doing is replacing every single down-sampled pixel we have to a 8x8 block by putting the same value of pixel to all 64 pixels in newly created 8x8 block, Which can be clearly seen in that zoomed image.

Now coming to quantization, which is digitizing the y axis i.e. amplitude,
we have 2⁸ possible values of a pixel i.e. 0–255. here we digitize those values.

Quantization example

The above image represents the quantization of an image, the first image is made of 256 possible values i.e. 8 bits per pixel,
2nd is for 4 bits per pixel i.e. 16 possible values, and 3rd is for 2 bits per pixel i.e. 4 possible values per pixel.

Depending on the number of independent variables, signals can be grouped into -
1D : tones, speech, audio, biomedical, remote sensing, etc. (typically function of time f(t) )
2D : text, greyscale, color, multispectral, hyperspectral images etc. (typically function of space f(x, y))
3D : video, 3D volume, etc. f(x, y, t), f(x, y, z)
MD : video of a volume, etc. f(x, y, z, t) and more.

Disparity map refers to the apparent pixel difference or motion between a pair of stereo images. To experience this, try closing one of your eyes and then rapidly close it while opening the other. Objects that are close to you will appear to jump a significant distance while objects further away will move very little.

— Stackoverflow

The difference between the left camera and right camera captured image is disparity map.
The disparity map relates to the depth in the image, so the two channels infuse the property will give us a depth perception, and the fusion can be done through a blue and a red channel.(3d movies you watch with red and blue polarized glasses.)

Types of images:
* Based on from EMS(Electromagnetic Spectrum)
* Acoustic/Ultrasonic
* Electronic
* Synthetic

Another Classification of images on basis of light:
* Reflection images
* Absorption images
* Emission images

These classifications above are self representative.

As we have read Images as signals, now we can go deep into that with Signals and Systems.

2D and 3D Discrete Signals

Discrete Unit Impulse

Discrete Unit Impulse function

Discrete Unit Step

Discrete Unit Step function

Yet we have too much too learn, next we will move on to convolutions in next article.

To read more related image processing and visualization stuff click here .

--

--