Introduction to Computer Vision

Published in

Analytics Vidhya

6 min readJun 10, 2020

“To see is to know what is where by looking.”

What is Computer Vision?

Well, the name defines itself. It is the ability of a computer to gain vision. When I say ‘vision’, I don't mean eyes will pop out of the system haunting us or simply providing a camera to the computer.

Vision is the ability to intercept details and features from an input image or video. But how would a computer perceive these characteristics since it only deals with 1s and 0s. The field of Computer Vision answers all these questions and has been around for the last 70 years!

Our main objective would be to answer any question regarding an image which can be answered by humans. This is also known as ‘Turing Test’ in artificial intelligence named after Alan Turing.

Computer Vision has been an integral part of the rise of artificial intelligence in the last couple of decades. With numerous applications in this field, the primary applications could be distinguished as:

1. Object Detection: It answers the basic question, “what and where?”.
What is the image/video exhibiting and ‘where’ or the location of the object in the image/video. It includes applications like ‘optical character recognition’, ‘object tracking’, ‘activity recognition’ etc.

The model detects various object along with their locations.

2.Object properties and attributes: After we have detected the object, what next? How about we extract their features and details? This makes model much powerful as now not only it detects object but also perceives information. It has some very important applications like ‘Medical Imagining’, ‘Face Detection and Verification’, ‘Image Retrieval’ etc.

The model detects the object along with its colour.

3.Metric vision: If we want to get details which even humans couldn’t possibly guesstimate, we tune in metric vision which includes many advanced applications like ‘Photogrammetry’(the science of making measurements from photographs), ‘Image-based 3D modelling’ and many more.

The input to photogrammetry is photographs, and the output is typically a map, a drawing, a measurement, or a 3D model of some real-world object or scene.

What are optical Images?

Optical images are defined as 2D representations of a physical object, obtained by the propagation of light through some optical system, depicting object contour and features or simply an image or video captured by a camera.

Digital colour images are 3D array where each array has pixel values which represent the colour density in the range [0,255]. These arrays are filters of predefined colours, thus all the colour images consist of three colours ‘RED’, ‘GREEN’, and ‘BLUE’. This representation is termed as ‘RGB’ and combining these primary colours all the different colours are obtained as per ‘trichromatic colour theory’. The digital ‘GRAYSCALE’ images are a 2D array of colours ‘BLACK’ and ‘WHITE’.

The tricolour filters(left) are combined to give optical images(right).

Various defects occurring in images

Our images can be defective in many ways and it impairs the system when extracting useful features and attributes. Some common defects are:
Low contrast: Image blends light and dark areas, creating a more flat photo.
Wrong colours: Images depict objects in colours that differ from reality.
Noise: Random variation of brightness or colour information.
Blur: It occurs due to sudden movements of objects or camera.
Non-uniform lightning: Light is irregularly placed over the image.

To overcome these problems we apply image processing.
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. It is a type of signal processing in which input is an image and output may be image or characteristics/features associated with that image.
Thus, an important part of computer vision.

Why image processing?

Image improvement for human perception. It improves subjective image quality.
Image improvement for machine perception. It simplifies the subsequent image analysis and recognition.
Image transformation for technical purposes such as change of image resolution or aspect ratio for display on the device.
To get artistic impressions from cool visual effects.

To start in the field of computer vision we should try to extract the most basic feature of the images i.e edges. Edge detection could be very useful as most semantic and shape information can be encoded in the edges.

Edge detection

Goal: Identify sudden changes or discontinuities in the image, referred to as edges. They are more compact than pixels and carry more information.

Intensity function: It gives the intensity of a channel at every position or pixel where darker the pixel lesser is the intensity value.

Edges: An edge can thus be defined as a place of rapid change in image intensity function. We can further locate edges using image gradients.

Image gradient: The gradient is the first-order differentiation of the intensity function and it points in the direction of the most rapid increase in intensity.
It is denoted as:

The partial derivative of intensity function in both the axes.

the gradient magnitude

Noise in images could be disadvantageous in edge detection. It causes unwanted spikes in the first derivative of image intensity which could make the real edges disappear. So it becomes very important to smooth the image intensity curve through image averaging to reduce noise.

Gradient magnitude estimation is not a complete edge detection. We need precision i.e gradients strength along thick ridges is larger thus propagating only significant edges. Also, there should be connectivity between edge points.

This can be done by ‘canny edge detector’ which uses ‘convolution’ in place of gradient magnitude detection but the idea remains the same. We can approximate partial derivative with finite-difference.

Convolution

This is regarded as the most important mathematical operation in image processing. We convolve a kernel or mask which is a small matrix to the input image which in turn gives astonishing results like sharpening, blurring, embossing and even edge detection. Different kernels are used for different tasks. The mathematical formula for convolution is given as: