Introduction to Computer Vision & OpenCV in Python

Shashwat Tiwari
Analytics Vidhya
Published in
8 min readDec 15, 2019
Photo by niko photos on Unsplash

Hello Everyone,

Computer vision is one of the most exciting divisions of computer science. A lot of research has been carried in this field for decades. Processing of images becomes faster and efficient thanks to cloud technologies and powerful GPUs and TPUs. Cars, robots, and drones start to understand what we see in pictures and videos. The interface “computer vision” between machines and humans will gain much more importance within the next few years.

Computer Vision is considered to be the hottest field in the era of Artificial intelligence. It can be hectic for newbies as there are some challenges that most people face while making a transition into computer vision

  • Is the Computer vision model is trainable in absence of GPU & TPUs?
  • Image Preprocessing -Cleaning of Image Dataset?
  • Why we use Deep learning instead of Machine learning for computer vision?
  • Should we collect more images before building our computer vision model?

You got it right I too faced these challenges so came up with this guide for helping you out in the field of computer vision.

Hang Tight!

What is Computer Vision?

According to Wikipedia

Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

In Simple words Computer vision is a field of deep learning that allows the machine to identify, process images just like humans do. In terms of parsing images humans perform extremely well but when it comes to machines detecting objects involves multiple and complex steps, including feature extraction (edges detection, shapes, etc), feature classification, etc.

OpenCV — Evolution in Computer Vision

According to OpenCV:

“OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.”

OpenCV contains implementations of more than 2500 algorithms! It is freely available for commercial as well as academic purposes. The library has interfaces for multiple languages, including Python, Java, and C++.

Setting up OpenCV

The point to be noted here that on the internet you can find many tutorials for installation Opencv in your ubuntu or windows machines. Just follow this link that helps me a lot in setup everything on the fly.

Now Hopping up on the fun part!

Reading, Writing and Displaying Images

An image can be represented as a multidimensional array. This is because a machine can represent everything as numbers and in python, numpy can be used to represent it while in C programming language it can be represented as format Mat.

For Images, usually, a generic word is used called pixels or pixel values. In the case of color images, we have three colored channels. Hence colored images will have multiple values for single-pixel values. Depending on the resolution and color depth, those arrays can vary in size. The color values go from 0 to 255. These color channels are generally represented as Red Green Blue (RGB) for instance.

Reading image in Opencv is simple, Point to be noted here that By default, the imread function reads images in the BGR (Blue-Green-Red) format. We can read images in different formats using extra flags in the imread function:

  • cv2.IMREAD_COLOR: Default flag for loading a color image.
  • cv2.IMREAD_GRAYSCALE: Loads images in grayscale format.

The image has been correctly loaded by openCV as a numpy array, but the color of each pixel has been sorted as BGR. Matplotlib’s plot expects an RGB image so, for a correct display of the image, it is necessary to swap those channels. This operation can be done either by using OpenCV conversion functions cv2.cvtColor() or by working directly with the numpy array.

Resizing Images

As general most computer vision models work on fixed input shapes. True pain arises when we perform web scrapping to scrap image datasets. Resizing is really helpful in training deep learning models. However different interpolation and downsampling functions also fall under the umbrella of OpenCV with the following parameters

  • INTER_LINEAR
  • INTER_AREA
  • INTER_CUBIC
  • INTER_LANCZOS4

Image Rotation/Flipping

Data Augmentation allows us to generate more samples for training our deep learning model. Data augmentation uses the available data samples to produce the new ones, by applying image operations like rotation, scaling, translation, etc.

It also helps our model to become robust and generalized. During the Data Augmentation technique Rotation or flip plays a significant role. It rotates the image at a specified angle by keeping labels the same.

Blending Images

With the magic of OpenCV, we can add or blend two images with the help of the cv2.addWeighted() method. addWeighted() function returns numpy array containing pixel values of the resulting image.

Blending is nothing but the addition of two image matrix. So if we want to add two images then that means very simple we have to add respective two matrices. For aggregating two matrices, the size of the two images should be the same.

Creating a Region of Interest ROI

In OpenCV, we can create ROI The basic idea behind ROI is that it maps the position of every object in the image to a new location in the final output image. ROI adds shift-invariance to the model also. By changing the object position model can learn patterns better which leads to the generalizability of the model.ROI can be extensively used in the image preprocessing stage.

Image Thresholding

Thresholding is a technique in which we move our pixel values to the threshold value. The thresholding basically takes each pixel value is compared with the threshold value. If the pixel value is smaller than the threshold, it is set to 0, otherwise, it is set to a maximum value i.e. 255. It also helps in the segmentation of an object from its background. It revolves around two values below the threshold or above the threshold.

Point to be noted here that this technique of thresholding is done on grayscale images.

Irrespective of Simple thresholding we have Adaptive thresholding It talks about the threshold value is calculated for smaller regions and therefore, there will be different threshold values for different regions. For more info please refer to this link.

Blurring and Smoothing

One of the most popular and common techniques in order to reduce noise in the image. It removes high-frequency content, like edges, from the image and commonly used image processing operation for reducing the image noise. The process removes high-frequency content such as edges from the input image.

In general, blurring is obtained by convoluting the input image by a filter kernel having a low pass. There are basically two types of blurring-

  • Average Blurring
  • Gaussian Blurring
  • Median Blurring

In Average Blurring, the image is convolved with a box filter. The central element of the image is replaced by the average of all the pixels in the kernel area.

Moreover in Gaussian Blur image is convoluted with a Gaussian filter. This filter is nothing but a low pass filter that removes high-frequency data from the image.

Edge Detection

Edges in images are the points where brightness changes drastically and has number discontinuities such as

  • Depth Discontinuities
  • Orientation Discontinuities

Edge detection has become very useful for extracting features of images for different image recognition applications like the classification of objects.

Image Contours

A contour is a closed curve of points or line segments that represents the boundaries of an object in the image. Contours are essentially the shapes of objects in an image.

Contours are sometimes called a collection of points or line segments that overall represent the shape of the object in an image.

Face Detection

OpenCV is awesome in detecting faces by using a haar cascade based object detection algorithm. Haar cascades are basically trained machine learning classifiers model that calculates different features like lines, contours, edges, etc.

These Trained ML models that detect face, eyes, etc are open-sourced at OpenCV repos on GitHub. Also, we can also train your own haar cascade for any object.

Check out this link beautiful explanation of Face detection using OpenCV from scratch.

End Notes

So Guys OpenCV is truly a wonderful and powerpack library for computer vision tasks. I highly recommend you run the above sample code in your machines as the best way to learn anything is to apply it on your own.

Moreover, there are a lot of other methods and techniques available for image manipulation at OpenCV.I encourage you to check out their GitHub repository and their official documentation for their implementation.

References

If you like this post, please follow me. If you have noticed any mistakes in the way of thinking, formulas, animations or code, please let me know.

Cheers!

--

--

Shashwat Tiwari
Analytics Vidhya

Senior Applied Data Scientist at EY || Machine Learning and Deep Learning Ardent ||