Getting started with Computer Vision in Python

Vignesh
DataX Journal
Published in
8 min readJul 24, 2020

“IF WE WANT MACHINES TO THINK, WE NEED TO TEACH THEM TO SEE!”

What comes first in your mind when you hear the term Computer Vision? A computer that can look around? Or a computer that sees objects and can process it? If it is so, then you have almost got it correct! Formally Computer Vision is an interdisciplinary scientific field that deals with how computers can gain a high-level understanding of digital images or videos.

OpenCV is an open-source computer vision software library that is widely used for computer vision applications, including simple applications like color identification to much more complex applications like human pose estimation and image segmentation. In this blog, we will be covering the basics of OpenCV using python and look at a few basic techniques that can be used in vision applications and image processing.

Installation

One can simply use pip install OpenCV-python to install the software in your machine or depending upon your operating system you could download and install it as well. Alternatively, you could also use GOOGLE COLAB, a free online cloud-based Jupyter notebook environment to get rid of all the installation procedures.

Lets Code!

Here is a list of all the topics that would be covered in this blog:

  • Reading and displaying an image
  • Resizing and cropping an image
  • Canny Edge detection
  • Color detection
  • Contour Detection
  • Processing videos and live camera feeds
  • A quick exercise to recap

Reading and displaying an image

Import the OpenCV library in the first line of the code.
Then read the image using the imread method. We pass in the path to the image as its argument and it returns the image as a NumPy array. We store this NumPy array in a variable ‘img’.
To display the image we simply write cv2.imshow(‘<name for the image window>’,<the image array>).
cv2.waitKey() takes in an integer argument that denotes the number of milliseconds for which the image needs to be displayed. If 0 is passed as an argument then the image will be displayed as long as any key is pressed.
destroyAllWindows() will simply close all the opened active windows.

The following image will be displayed as output.

Resizing and cropping an image

The cv2.resize() method lets you resize the image just in a single line of code. We pass in the image and the dimensions for the image to be resized as arguments and can store the newly resized image in a new variable.

Since the image is nothing but a NumPy array consisting of the pixel values at each coordinate, we can perform basic array slicing to crop the image.

Original Image
Cropped Image

Canny Edge detection

Canny Edge detection is a very popular edge detection algorithm developed in 1986. To know about the working and the steps involved in Canny Edge detection go through the official documentation through this link.
OpenCV puts all the steps involved in this edge detection algorithm in a single function, cv2.Canny().

The first argument is the input image and the next two arguments are the minimum value and the maximum value for thresholding the edges that are generated.

Output image displaying the edges

Color detection

Color detection involves the detection of a range of RGB pixel values. In the above code, we first convert the image to an HSV image(HSV is an alternative representation of the RGB color model, designed in the 1970s by computer graphics researchers to more closely align with the way human vision perceives color-making attributes).

The challenging task here is to define the lower_blue and upper_blue arrays. The lower_blue array is the lower range of the HSV values and upper_blue is the upper range of the HSV values and we need to filter out the pixel values in between. The HSV values for any color could be found using an online tool or a color map model. Then using these two arrays and the HSV image we get a color mask that is nothing but a binary image having white pixels in the blue regions and black elsewhere. This mask is obtained using the cv2.inRange() method. Then to get the resultant image as shown below we compute a ‘bitwise and operation’ between the mask and the original image.

Original image, mask and the resultant image

Contour Detection

The lines joining all the points along the boundary of an image having the same intensity are called contours. Contours are very useful in shape detection, estimating dimensions of objects, object detections, etc. Contour detection works best on binary images so here we have first used canny edge detection on the original image and then used the resultant image to find contours.

The cv2.findContours method is used for detecting all the contours in the binary image which takes three arguments here. The first one is the source image, second is contour retrieval mode, third is contour approximation method and it outputs the image, contours, and hierarchy. ‘contours‘ is a Python list of all the contours in the image. Each contour is a Numpy array of (x, y) coordinates of boundary points of the object.

The contour approximation defines how many coordinates of the shape we need to store. For example for a straight line, we would only want the coordinates of the endpoints to be stored. In such a case, we may pass cv2.CHAIN_APPROX_SIMPLE. If we want to store all the boundary points, we would use cv2.CHAIN_APPROX_NONE as done in the above code snippet. The contour retrieval defines how we want to retrieve the contours generated either in the form of a tree or a list, etc. The below documentation gives detailed and well-explained theory about hierarchy and the contour retrieval methods. I would recommend going through this documentation once.

cv2.drawContours() draws the boundaries of the shapes or the contours in the image. The first parameter is the source image. The second parameter is the contour array that was returned by the cv2.findContours method(). The third parameter is an integer that represents the particular index of the contour array that we wish to draw. If we want only the first contour to be drawn we may pass in 0 (-1 represents draw all contours). The next parameter is the BGR values (color) for the contours followed by the thickness of the contours.

The input and output images are shown below.

Processing videos and live camera feeds

Till now we have seen a few operations on a single image. What if we need to do such processing on a video or live camera feed? Only a few changes in the above codes would make things going.

Videos are nothing but a series of images. So to process a video, we just need to loop through the series of images constituting the video. The below code opens the camera and displays the live feed continuously and closes the frame when the ESC key is pressed.

If a secondary camera is connected to the PC we can pass 1 in the cv2.VideoCapture() method to access it(0 represents default camera). The cv2.cap() method returns the current frame and a binary value(0,1).If the frame is read correctly, it returns True. We then display the frames continuously using the cv2.imshow() method. All the above techniques and processing such as color and contour detection can be applied to videos as well by treating each frame as a single image inside the while loop.

If we need to process a video from a file we can pass in the path to the file in the cv2.VideoCapture() method as shown below.

In the above code, we are also resizing each frame and converting it to a grayscale image.

Resized grayscaled video

A quick exercise to recap

In the below image we need to find the number of blue dots (including incomplete or semi-circles at the edges). On counting, they are 30 in number. Let ‘s find this number using OpenCV.

We will solve this problem by combining the techniques we discussed earlier.

The steps are as follows:

  1. Read the image
  2. Resize the image to an appropriate size if required so that is easier to view.
  3. Apply blue color detection and generate a mask.
  4. Apply contour detection on the mask and count the number of contours generated.
  5. Display the results.
CODE SOLUTION

On running the above code you will get the following output.

Till now we have seen a quite few techniques to get started with computer vision in python and in the end, I would just suggest you try all the above codes yourself and see the outputs. I will try to come up with the next part with some more advanced functions and techniques. Till then stay tuned!

--

--