Neha Singh
4 min readMar 31, 2020

Computer Vision “The Most Demanding Area”

Computer Vision is the sub field of AI and also a branch of Computer Science. Applications related to computer vision is becoming very popular in today’s era. The reason behind its popularity is its use cases which are proving huge benefits to industries as well as humans.

Zuckerberg said, “If we are able to build computers that could understand what’s in an image and tell a blind person who otherwise couldn’t see that image, that would be pretty amazing as well.

This technology is providing an immense power to machines by which they can visualize world, just like humans. Human computer interaction, image retrieval in digital libraries, medical image analysis, and the realistic rendering of synthetic scenes in computer graphics these are the some of computer vision applications.

In this article we will focus on how computer vision applications works and what are the methods we need to know for understanding inside functioning of these applications.

Before jumping to the technical aspects let’s take a glimpse how computer vision works as compared to Human Vision. Below image shows a pictorial overview of computer verses human vision:

Now let’s dive more deeper in computer vision.When we think about the computer vision applications, the list is large.We use some of them everyday like unlocking your phone by face recognition, scanning documents via cam scanner — Computer Vision is everywhere.

Now we will first prepare ourselves for working on these applications from basics of computer vision, for that we need to understand below terms and this will help us to understand how an image is processed by a computer and how it will analyse that :

  1. Semantic Segmentation
    Semantic segmentation is the the task of assigning a class to every pixel in a given image. This is significantly different from classification.
  2. Classification
    Classification refers to a type of labeling where an image/video is assigned certain concepts, with the goal of answering the question, “What is in this image/video?” Classification assigns a single class to the whole image whereas semantic segmentation classifies every pixel of the image to one of the classes.
  3. Localization
    The task of object localization is to predict the object in an image as well as it’s boundaries. We can also say localization means returning location of object detected.
  4. Object Detection
    Object Detection is used for locating instances of objects in images or videos.The difference between object localization and object detection is subtle. Simply, object localization aims to locate the main (or most visible) object in an image while object detection tries to find out all the objects and their boundaries.
    Object detection algorithms typically leverage machine learning or deep learning to produce meaningful results. When humans look at images or video, we can recognize and locate objects of interest within a matter of moments. The goal of object detection is to replicate this intelligence using a computer.
  5. Instance Segmentation
    Instance Segmentation includes identification of boundaries of the objects at the detailed pixel level or we can say it identifies each instance of each object featured in the image instead of categorizing each pixel like in semantic segmentation.For example: instead of classifying five sheep as one instance, it will identify each individual sheep.

Below image will more clarify these terms:

Now moving forward, If we are discussing about computer vision and we haven’t discuss about OpenCV then we are having a measure missing. Let me tell you why.. 💁‍♀️

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.

The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms. These algorithms can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc.

To perform all these operation first we understand how we can use OpenCV and we will start from the basics like Reading,Writing and Displaying images/videos, Changing Color Spaces, Resizing Images, Image Rotation, Image Translation, Simple Image Thresholding, Adaptive Thresholding, Image Segmentation, Bitwise Operations, Edge Detection, Image Filtering, Image Contours, Feature mapping , Face Detection etc.

To understand all above terms must visit below link :

https://github.com/Neha609/OpenCV

This link consist of all basic operations performed with OpenCV by using python language as well as the explanations.

Hope this article helped you to sort out your basic queries on computer vision. I believe this will provide a right direction to give a start on Computer vision applications/tools. If you like this article never forget to give a clap on it. And also if you don’t like this article or your expectations are something else then i request you to mention it in comments so that i can work on it.

Thanks a lot for devoting your time. Happy Reading ☺️.