Computer Vision — Making the machines see

6 min readJan 13, 2022

Credits : https://commons.wikimedia.org/wiki/File:Detected-with-YOLO--Schreibtisch-mit-Objekten.jpg

Human beings so easily perceive the visual surroundings, it doesn’t require any effort. It is like instantly interpreting the visual information we see without even being bothered as to how we understand? But making machines see is not easy and requires a lot of work. Lots of research and efforts have been undertaken and are ongoing to make the systems understand the visual inputs just like human beings.

Computer vision (CV) is a field of AI that provides the capability to the devices to interpret visual inputs like images and videos. With the growing usage of intelligent devices and visual inputs like images and videos forming a large share of data, CV is one of the rapidly growing AI fields. According to a Grand View Research report, the global computer vision market size is expected to grow at a CAGR of 7.3% from 2021 to 2028.

Why is Computer Vision even required?

The popularity and significance of Computer Vision lies in the numerous ways it can be applied in real-world scenarios. Computer Vision abilities embedded in various devices help to gain insights and take real-time actions.

In healthcare for example CV helps in the analysis and interpretation of medical images. It can help to precisely identify any critical condition even with a minor presence that may not be possible with human vision. It helps to save the time of doctors in analysis and interpretation as well as providing timely inputs in saving patients’ life especially with critical diseases like cancer.

It is also useful for autonomous vehicles, robots, and drones. These devices need to interact with surroundings, interpret it and act accordingly. CV can be used for these devices in various ways like self-navigation, surveillance, counting objects, for agriculture purposes like scattering seeds or detecting forest fires and helping to control these.

Some of the other applications of computer vision are facial recognition, augmented reality, number plate detection, delivery drones, traffic analysis, defect detection, analyzing customer traffic, and many such important use-cases. CV has thus become popular across industries and daily life activities for better and faster operational efficiency, security, and convenience.

Computer Vision Tasks

Computer Vision involves many tasks to understand the visual inputs like images and videos. The algorithm interprets the image in the form of pixels and further performs calculations on it to identify these inputs. There are many different types of computer vision tasks available like object detection, image segmentation, classification which are used according to the need of an application.

Image Classification

Image Classification categorizes and labels the image into a certain class. The system interprets the pixels in the image and labels the whole image into a certain class. The main image classification techniques are Supervised classification and Unsupervised classification.

a. Image Classification Technique — Supervised

In a supervised technique, the algorithm is trained on a labeled dataset. It uses an annotated dataset with the chosen categories which help to identify the unknown data into these classes. Some of the popular algorithms in this technique are K Nearest Neighbors, Support Vector Machines, Decision Trees whereas some of the neural networks used are AlexNet, ResNet, DenseNet.

b. Image Classification Technique — Unsupervised

In an unsupervised technique, the algorithm is trained on a dataset without labels and it learns by recognizing patterns. It analyzes and discovers the patterns in the dataset.Some of the common algorithms in this technique used are k-means clustering and ISODATA(Iterative Self-Organizing Data Analysis Technique)

Image classification is used in many practical applications for use cases like medical imaging, traffic control system, and surveillance .

Object Detection

It determines where the object is located in the image (localization) and classifies it(classification).

The main difference between classification and detection is that classification considers the image as a whole and determines its class whereas detection identifies the different objects in the image and classifies all of them. In detection, bounding boxes are drawn around multiple objects and these are labeled according to their particular class.

Some of the algorithms used for object detection are:
R-CNN (Region-Based Convolutional Network)
R-FCN (Region-based Fully Convolutional Network)
YOLO (You Only Look Once)
SSD (Single Shot Detector)
HOG (Histogram of oriented gradients)
SIFT (Scale-invariant feature transform)

Object detection is also used in numerous use cases like object tracking, face and person detection, self-driving cars, anomaly detection, surveillance, and medical diagnosis.

Image segmentation

It creates a mask around similar characteristic pixels and identifies their class in the given input image. Image segmentation helps to gain a better understanding of the image at a granular level.Pixels are assigned a class and for each object, a pixel-wise mask is created in the image.This helps to easily identify each object separately from the other. There are different types of Image Segmentation available.
Two of the popular segmentation are:

a. Semantic Segmentation
It classifies pixels belonging to a particular class. Objects belonging to the same class are not differentiated. In this image for example the pixels are identified under class animals but do not identify the type of animal.

b. Instance Segmentation:
It classifies pixels belonging to a particular instance. All the objects in the image are differentiated even if they belong to the same class. In this image for example the pixels are separately masked even though they belong to the same class.

Different approaches can be taken for image segmentation. It can be a Similarity Approach or a Discontinuity Approach. There are also various techniques available like the threshold method, Edge Based Segmentation, Region-Based Segmentation, Clustering Based Segmentation, Watershed Method, Artificial Neural Network Based Segmentation

Many other tasks can be performed with computer vision like pose estimation, depth estimation, and a few others. Computer vision is proving to be useful across different use cases to take real-time action. This is also one of the factors driving CV applications to the edge. CV at the edge can be utilized for drones, autonomous cars, assembly lines, or robots enabling them to act on these inputs in real-time and not completely depend on the cloud. With the advancements in machine learning and neural networks, these algorithms are becoming more accurate and helping to solve real-world problems. Though at the same time the neural networks are becoming computationally heavy which poses a challenge in deploying these on resource-constrained edge devices.

EDGENeural.ai has designed a software-defined platform- ENAP Studio, that helps to manage the entire end-to-end workflow for taking an AI model to the edge. It can be used to Train, Optimize and Deploy AI models for the edge devices.

Are you a developer, researcher, an AI enthusiast curious to learn and explore new tools? The beta version of the ENAP Studio is here for free early access:

Do sign up and explore ENAP Studio : https://edgeneural.ai/early-access/

ENAP Studio can be utilized for Computer Vision tasks to perform the below activities:

Training an AI model
An AI model can be trained easily. Select a model to train, which prefills the default parameters like epochs and batch size for best training performance, upload the dataset, and begin training
Optimizing an AI model
The resulting AI model from the previous step can be optimized for your target hardware easily through ENAP without compromising the acceptable level of accuracy. Our proprietary optimizing techniques improves model performance and accelerates inference.
Deploy model
Seamlessly deploy the trained and optimized models with a click.
Model Zoo
Consists of pre-built and pre-optimized models for various tasks like hard hat detection, face detection, person detection, and vehicle detection which can be readily used.

Computer Vision — Making the machines see

Written by EDGENeural.AI