Chapter 1 : History and Introduction of Computer Vision

Anshuman Patel

Follow

Published in

Computer Vision 101 with Deep Learning

4 min readFeb 14, 2019

--

History: How it really began

It was the summer of 1966, Seymour Papert who just joined Artificial Intelligence, assigned a summer project to a group of 10 undergrad students, including himself. [Source of original paper]

The project was about analyzing and recognizing the objects in the scene and categorizing them into regions such as
1. Likely Objects
2. Likely background Areas
3. Chaos

The end goal being, OBJECT IDENTIFICATION, which will actually name objects by matching them with a vocabulary of known objects.

There is no resource through which we can gain an insight into the outcome of the project. Most likely they wouldn’t have succeeded, but they set a foundation for a new field in Artificial Intelligence.

Since then we have come a long way, now we have succeeded in integrating Artificial Intelligence into the computer vision and number of researchers spending their day and night to push this idea of computer vision forward.

But what Computer Vision really is?

According to Wikipedia,

“Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do”

In simple terms, Computer Vision is the science of giving machines a human-like ability to process any kind of images, videos and infer information from it directly, just like we humans do.

Typical use cases of computer vision include Barcode Scanning, Facial Recognition, Google image search, Pinterest Lens, Amazon GO, Augmented Reality apps like Pokemon go, Optical Character Recognizers, Panorama mode in our camera, Line following robots and the state of the art humanoid robot Sophia and if you are a techie you might know about “not Hotdog” apps from Jian Yang in Silicon Valley and Pranav Mistry’s “Sixth Sense”.

With the new advancements in this field, the possibilities are endless. We might soon be able to see new applications taking over our daily life for good. Like Number plate recognition at every signal and automatic challan issue for breaking the traffic rules, Telling machine to look for keys in our house, Robots delivering our grocery, Driverless cars, or even a robo-pet.

Disciplines of Computer Vision

1. Recognition
2. Motion Analysis
3. Reconstruction
4. Restoration

1. Recognition

Recognition deals with determining whether or not the image data contains some specific objects, gestures and sometimes the position in the image where they are present.

Some of the subtasks under Recognition include
i. Object Classification:- Classifying objects from a predefined set of classes/categories.

**A snapshot from “Not Hotdog” app from TV series Silicon Valley demonstrating Object Classification.**

ii. Localization:- Finding where the objects are present in the image.

**The red box signifies the region of the input image where the cat is.**

iii. Pose Estimation:- Estimating the position or orientation of a specific object relative to the camera.

**This picture shows input(in Left) and outputs(Middle[2D] and Right[3D]) of the pose estimation software.**

iv. Facial Recognition:- Identifying a face based on its facial features, like the distance between nose and eyes, nose and ears.

**This picture shows facial points which are tracked in order to recognize a face.**

v. Optical Character Recognition:- Identifying characters in images of the printed or handwritten format.

**The three stages of a character recognizer. The author has also made one similar to the picture above, you can try it at,** https://blado-runnero.github.io/char.html

2. Motion Analysis

Unlike Recognition, motion analysis takes an image sequence( or videos) as input and produces an estimate of velocity as output.

One application is tracking software that tracks velocity, relative position, of an intended object in a video.

**The figure shows motion analysis in the action.**

3. Reconstruction

Reconstruction is the process of reconstruction of a digital version of a real-world object from pictures or scans of the object, preferably a 3D version, or a 2D version.

Its application includes scanners, panorama, 3D modeling.

**A figure showing inputs(on left) and a 3D model generated as output from those inputs**

4. Restoration

Restoration is the process of removing noise, motion blur, camera misfocus, generating lost parts of images etc from the image/video and get a more clear picture of the same.

One prominent example of restoration is described in the following video.

A video from Two Minute Paper about a paper about Image Restoration

What to expect in our next Chapter?

In the series of articles, we will go through different concepts and hands on tutorials about various Computer Vision topics.

The next one will be an Introduction to Deep Learning. The article will contain something for both programmers and general audience curious to know what happens behind the scene and have a general understanding about AI and Computer Vision.