Advancements in Computer Vision

Hang Xu
data-surge
Published in
3 min readMay 3, 2022

Computer Vision is a sector of modern day ML workflows that relates to the field of object detection and segmentation. One of the key aspects of Computer Vision is CNN’s (Convolutional Neural Networks). There are several versions of CNN and today I will be going over some of the high level processes that each take in order to generate high quality mask instances for objects within an image.

Standard CNN

To get started, we must first define and understand what a CNN is. In its most basic form, the image goes through 3 layers. At the convolution layer, high level features are extracted from the section of the input image, features such as edges, color, gradient orientation and etc. This precedes the pooling layer which downsizes the features through dimensionality reduction as to reduce the amount of computational power required to process the data. After that, the data is fed through a fully connected(FC Layer) which is a neural network that is able to distinguish based upon dominating and low level features within the image to make classification based on the results.

R-CNN

A method involving the use of selective search in order to aggregate regions of the image based on their likeness and similarity as to reduce the total number of regions to approx. 2000. The results of these 2000 regions are warped and fed into a convolutional neural network and to generate a feature vector as the output. This output is then fed into a Support Vector Machine (SVM) that will classify the presence of objects within the image.

Faster R-CNN

There were multiple drawbacks and difficulties related to the speed at which the R-CNN could be executed in the previous settings. So Faster R-CNN came about seeking to resolve those issues. Instead of using a selective search algorithm to deal with object detection, it uses the network to assume the responsibility of region proposals. After the proposals have been established, it is then reshaped using a Regions of Interest (RoI) pooling layer which classifies the image within the bounded regional boxes.

Masked R-CNN

Masked R-CNN is one of the more recent developments in Computer vision and it is the culmination of various aspects of its predecessors. Its initial stages are practically identical to that of Faster R-CNN but in addition to the bounding box and the label associated with each bounding object, it assigns a object mask attribute to the object which is a much finer extract of the spatial qualities of the object allowing for greater distinction when it comes to generating instance segmentations within a spatial body.

Summary

Computer Vision has come a long way in the short time that is has become available to the general public. There will still be advancement and improvement whether that be in terms of the method to which the searching/bounding of objects is done or the detection of the richness of each individual feature and even areas that have not been thought of yet. One way to experience many of the features and advancements in Computer Vision is through Google Colabs and their work in Detectron2. Here is a link to their introductory notebook with sample code snippets and guide to access the Detectron2 API: 📓

If you would like us to help, please email us at info@datasurge.com or complete the form on our contact us page.

--

--