Computer Vision through OpenCV

Published in

Analytics Vidhya

5 min readSep 28, 2019

Vision is the highest bandwidth sense and it provides a firehouse of information about your surroundings. The functioning of the human brain on tasks such as understanding and analyzing relies more on visual information than other senses.

Digital cameras can capture the best moments for you and also glorify those images if you need them. But just like to hear is not the same as to listen, to take a picture is not the same as to see. For this reason, many algorithms have been developed to improvise the functionality of computer vision.

What is computer vision?

In simple words, enabling the computers to see, identify and process the images by mimicking the working of a human eye and gain high-level understanding.

How does computer vision work?

As you might be aware, computers store images in the form of pixels. A pixel is a small addressable element on a screen. These pixels represent intensities across a color spectrum. Each pixel is defined by a color stored as a combination of the three primary colors red, green and blue.

This phase is called image acquisition where an image is converted to machine-understandable binary data.

Usually, these images are converted to grayscale before processing to improve the signal to noise ratio and ease the visualization process.

**Conversion of grayscale images to pixel numbers based on the intensity of the color spectrum.**

When we look at an object per se a flower, our brain could immediately recognize it. Does that mean we could compute faster than a high-end computer? No! Our brains are cheating since we got a couple of thousands of years worth of evolutionary context passed throughout generations. But a computer doesn’t have that advantage. For a computer, an image is barely a 2d array of numbers.

Processing the pixels

Once the pixel numbers are ready, we are good to go for the next step i.e classification of these images based on the content, color, type, etc. One of the hot new algorithms that perform these tasks is the convolutional neural networks or CNN.

CNN takes matrices of pixels called filters as input and starts comparing the patterns to specific patterns the network is looking for and starts classifying using a top-down approach. In the starting levels, it understands the borders and shapes and in the end, it identifies the object.

No matter how deep these networks are without adequate data, we cannot trust any algorithm. We need to train the model with huge amounts of varied data for its better performance.

But a CNN is merely restricted to a single frame. It can only process a single image. This will again lead to a dilemma and misunderstanding.

Consider an image of a person swinging. Now if we look at a single frame we do not know if that person is moving to or fro at that moment since there is no pre-context. These things often lead to misunderstanding while analyzing an image. This is where CNNs come up lacking. They can only take into account spatial features but not temporal or time features. This can be done much easier using a video.

A video is nothing but a collection of frames. The output from the CNNs of each frame must be taken and fed back to a new network. This type of model is called Recurrent Neural Network or RNN. An RNN can use the data that it’s already processed and use it for its decision making.

The main and most important feature of RNN is the Hidden state, which remembers some information about a sequence. RNNs have a “memory” which remembers all information about what has been calculated. It uses the same parameters for each input as it performs the same task on all the inputs or hidden layers to produce the output. This reduces the complexity of parameters, unlike other neural networks.

Beginner steps towards OpenCV using Python

OpenCV is a library of programming functions mainly aimed at real-time computer vision initially developed by Intel. It can be used to perform image capturing and processing.

To install OpenCV use the command:

pip install opencv-python

Now open any editor or the python IDLE and import the cv2 module:

import cv2

To read an image from the directory we use: cv2.imread(path, flag)

img_color = cv2.imread('stephen_hawking.jpg',1) #reads a color image
img_gray = cv2.imread('stephen_hawking.jpg',0)#reads a gray image

To display the image in a window use

cv2.imshow('image', img_color)  #displays image in a window
cv2.waitKey(0)  #will keep displaying till any key is pressed
cv2.destroyAllWindows()

To see the pixel values of the image just use print(). We can also see that the color image is in 2 dimensions and the grayscale image is in 3 dimensions since an extra dimension is added to store BGR color intensities.

print(img_color.shape) #prints the dimensions 
print(img_color) #prints 3d array representing color intensities

**Left: 3d pixel array Right: Grayscale image**

To know the BGR values of a pixel at (x,y), print the value at [y][x].

print(img_color[0][0]) #output: [105  86  59]

Finally now to save an image:

cv2.imwrite('stephen_hawking_grayscale.jpg',img_gray)

Computer Vision through OpenCV

What is computer vision?

How does computer vision work?

Processing the pixels

Beginner steps towards OpenCV using Python

Written by Tarun Acharya