Introduction to Computer Vision

Sanskruti Dube
GDSCVITBhopal
Published in
5 min readMar 10, 2021

If I ask you to look at the picture below and list some of the objects you see in the frame, what will they be?

The Times Square(NYC)

I am sure you would have probably come up with a long-drawn list of the objects you saw without giving a second thought. It may include cars, different colorful billboards, vibrant stores or the bollards on the road or the tall buildings or the potted plants alongside the road, and many other things.

If I ask you to describe the picture you saw in just a sentence, you will probably say, “It’s the New York Times Square!” again without giving a second thought. These tasks were extremely easy for you as even a person with below-average intelligence and understanding or even a six-seven-year-old kid can accomplish the same.

However, are you aware of how you did that? Behind the scenes, a very intricate process takes place. Human vision is a very complicated segment of organic technology that encompasses our eyes and visual cortex. It draws in mental models of objects and our abstract understanding of concepts. It takes into account our personal experiences too through the innumerous interactions we have made with the world.

Today we see that digital devices can also take pictures with a resolution that oversteps human vision. Computers can also detect objects along with their colors with the trump card of great accuracy.

What is “Computer Vision”?

An illustration of Computer Vision

“Computer Vision” these words themselves speak their meaning “the sight of a computer”

In technical terms, it connotes the field of computer science that focuses on the study of how computers see and understand digital images. Computer vision covers “seeing” or sensing a visual stimulus and also understanding and extracting complex information. The extracted data can be used in other processes.

This interdisciplinary field simulates and automates the elements of human vision systems using sensors and machine learning algorithms. It forms the core underlying artificial intelligence systems. These systems get the ability to see and understand their neighboring environment through Computer vision.

It draws its inspiration from the human vision system enabling computers to identify and process objects in images and videos in the same way as humans do.

How does a digital device interpret an image?

The computer doesn’t see the image in the same manner as we do. Unlike us, it interprets the entire image in terms of numbers. For a machine, an image is nothing but a 2D matrix of numbers where each entry represents the intensity of light or color for the given position or so-called pixels. For a grayscale image, the numbers in the pixels will range from 0 to 255 according to the respective intensity of color.

What we see vs what the computer reads

For example, we have a handwritten digit 8 as a grayscale image which is represented in the form of a 28*28 matrix with 784 pixels. The first one is how we see the digit 8, and the second and third ones show how a computer reads the image of the digit 8 in terms of pixels where 0 will be denoting the completely black part and the intermediate numbers will show the shades between black and white according to the image (notice the border of the digit) and as you move on towards the maximum number 255, we will shift towards the whiter part of the image. The entire matrix will be flattened as in, each row will be concatenated one after the another as shown in the illustration below:

formation of a 1D array of pixels

The second row will be attached to the first one and the third one to the second, thus creating a single 1D array of pixels that will be fed to the hidden layer of the convolutional neural network. So it’s also said that “ a picture is simply an array of pixels”

Any color can be represented in the form of three colors: RED, BLUE, GREEN. If the image is a colored one that is an RGB image instead of a grayscale one, we can represent that as 3 of these 2-D matrices stacked on top of each other. These 3 matrices are one for each channel of Red, Green, and Blue. So, we will have a matrix for red color, another for the green color, and one for blue color that will be converted to their respective 1 D arrays, signifying that the final colored image will be a combination of these arrays forming a single 3D array as shown below:

RGB image

What are the applications of Computer Vision?

  1. Facial Recognition:

It is a powerful security tool used these days to “see” who is trying to gain access to something. Example — mobile screen lock. Computer vision algorithms detect facial features in images and compare them with a database of profiles where the faces are stored. Law enforcement agencies also use facial recognition technology to identify criminals in video feeds.

2.Autonomous vehicles:

Self-driving cars require information about their environment to decide how to behave. Cameras capture video from different angles around the car that is fed to computer vision software. The images are processed in real-time to seek out the extremities of roads, traffic signs, detect other vehicles, objects and pedestrians. The autonomous car can then steer its way hopefully without accidents.

3.Image search and object recognition:

CV theory is used to identify objects within images, search through catalogs of images, and extract information out of images.

4.Robotics:

Most robotic machines, often in manufacturing, need to see their surroundings to perform their tasks.

5.In health-tech:

Using CV algorithms, cancerous moles on skins can be detected, it can aid in X-rays and MRI scans. It can also help in accurately diagnosing the illness, timely detection of illness, and heightened medical process.

Conclusion:

This was merely some rudimentary information to Computer Vision. Computer Vision theory is a vast topic that has boomed up to great heights due to the revolution of artificial intelligence and advances in deep learning and neural networks. This field has witnessed great leaps in recent times. It has successfully surpassed humans in many tasks related to detecting and labeling objects.

--

--

Sanskruti Dube
GDSCVITBhopal

Content Writer at Google Developer Student Club VIT Bhopal | Pursuing B. Tech CSE in Artificial Intelligence and Machine Learning