Computer Vision Approaches

Dashanka Nadeeshan
AI Geeks
Published in
6 min readJan 22, 2018

A brief overview

Computer vision is humans trying to teach computers how to see. This seeing can refer to understanding scenes, recognizing objects, constructing 3D models, avoiding obstacles and navigate and so on. There are plenty of application of computer vision around us today. Such as robotics, augmented reality, virtual reality, bar-code and QR code scanning, fingerprint scanning and taking panoramas from your smart phone. However, computer vision has become one of the major topic in filed technology. Today, machine learning techniques, computer algorithms and graphics processing hardware have made computer vision technologies successfully applied in real world application like consumer electronics and industrial applications. According to the common definition, any complete computer vision system combines two main components: technical means, which is hardware and applied mathematics for information processing and algorithms. It is undoubtedly true that efficient performance have been achieved due to rapid growth of hardware technologies like 3D graphics accelerators with powerful processing capabilities. In parallel, development of high level multi capable algorithms powered by machine learning technologies like artificial neural networks and intelligent learning techniques fertile the fact. Comparing the difference between human visual system vs computer visual system; For computers, images are just data as an array of numbers (colors image data will be 3 arrays of numbers that each varies from 0 to 255). These numbers we call as pixel values and combination of pixels build an image. But in contrast, human visual system capable of performing more sophisticated and semantic interpretation from what it captures. Therefore, in order interpret visual captions, computers generally have four main approaches.

Object recognition and captioning

Recognition; Is basically the technology of recognizing and understanding objects in images (and videos). Human recognition system is capable of handling multiple objects with little effort. Even-though the objects seen in different points of views, different scales or even partially obstructed from the view. But this is very challenging task for computers. Today, there has been developed many methods to understand and recognize objects in different difficulty levels. Thanks to machine learning techniques, for a given photo, vision capabilities have extended from recognizing the objects to developing semantic and geometric relationships between them. These developed relationships can be used to perform tasks like relational reasoning and visual understanding. In machine learning, a powerful sub category called deep learning and artificial neural networks are heavily used in achieving these near human performance tasks. Neural network types like convolutional neural networks, recurrent neural networks and combinations been using to develop these amazing skilled models and algorithms. During the last decade number of outstanding research work have been carried out and published under field of object recognition and varieties. In the near future the field is expected to be developed to out-perform the human vision capabilities.

3D reconstruction

Reconstruction; Is building up visual models (3D) out of given visual data from various points of perspectives. This can be easily explained using Google maps or your smart phone camera app. In google maps, spherical panoramas and 3D models of objects (places) are develop using 2D image data taken from multiple view points and angles. These captured data taken then fed into algorithms that matches features between visual data and then reconstruct visual models. Motion capture is another application of visual reconstruction techniques and it’s a process of recording different types of motion data. This involves visual sensor like types of cameras and other sensory data like IMUs . Development of various vision hardware like depth cameras, stereo vision equipment and 360 cameras have made this methods applied in many different fields. Model reconstructing use to create models of terrain and environments. They are known as 3D maps which is mainly used in robotics and unmanned navigation purposes like simultaneous localization and mapping (SLAM)

How self driving cars see

Registration; is the process of transforming different sets of data into one coordinate system. This could be multiple image data or different vision media or any other sensory data (LIDAR) from various means of perspectives. Basic steps of image registration can be list out as feature detection, feature matching, mapping function design, transformation and re sampling. There are several types of algorithms available in literature. Such as intensity- based vs feature-based and spatial vs frequency domain methods (according to Wikipedia). Computer vision approach of registration have successfully applied in many diverse fields and applications. Medical image registration is an important application and valuable assistant to medical experts which is using of medical vision data to analyze and detect changes or monitoring (Tumor monitoring). Images obtained from a single modalities like MRI, CT may not be able to provide all the required information. It required to combine information obtained from other modalities also to improve the information acquired. Self-driving cars are a huge topic these days. These self-driving cars must be able to track pedestrians, understand the behavior of other vehicles while moving around, understand the traffic signs like traffic lights and detect lane marks to perform its driving safely and accurately. Computer vision and registration approach plays a huge part on achieving such a task. This kind of sophisticated systems mainly use machine learning techniques in processing and decision making. In mobile application like Snapchat selfie filters are more familiar daily life application of computer vision registration.

Robot grasping using vision data

Reorganizing; can be referred as mimicking perceptual organization ability of human vision. In computer vision, reorganization generally means for grouping and segmentation of vision data reconstructions. But in context of machines it is known as unsupervised learning (learn from unlabeled data like a children learn things by themselves without being explicitly taught). Traditionally computer vision models learn by huge amount of pre-labelled data, but in unsupervised learning they receive data without labels and somehow computer model is clustering them or reorganizing them in a way that makes sense. This can explain by recent research work done by Google robotics lab and it has been published in their research blog. In this work, computer vision is been used to perform a grasping task by a robot arm. Grasping tasks are very common in industrial applications. But those industrial robots are explicitly programmed to perform specific jobs (grasp specific objects in specific ways) and they are well fed with good sensory data. But it quit difficult to perform a grasping task for a robot without any explicitly programmed and only using visual sight like humans do. In google robotics lab, they train robot arms to pick up different types and shapes of objects that the robot has not seen or experienced before. These robots do not recognize the grasping object as an apple or cup, instead it just trying to grasp the object, but the program does not describe anything about the object but learn how to grasp it by experiencing the grasping. With experience, it learns how to grasp many sorts of objects for example, For example, if the object is spongy. it’ll learn that you cannot pick it by grasping but you have to pinch it, Like wise, the program learns to perform other actions too and with enough training and learned experience this can reach up to near human performance. Computer vision works here in form of convolutional neural networks and doesn’t work solely. Reinforcement learning algorithms been using for the learning process. There are lot of other important and useful applications too.

It has discussed a very basic and brief overview of computer vision and their approaches. As we look into recent developments and work done, contribution of machine learning techniques including deep learning methodologies have led to achieve a huge success. Because of continues research work carrying out and applications in many diverse fields, field of computer vision is keep developing and improving every day.

--

--

Dashanka Nadeeshan
AI Geeks

Student at Hamburg University of Technology. Study Mechatronics, Robotics and Intelligent Systems. Visit me@ https://www.linkedin.com/in/dashankadesilva/