Computer Vision: Introduction

Ryan Anthony J. de Belen
The Startup
Published in
5 min readSep 13, 2020

What is Computer Vision?

Computer Vision can be generally understood using two perspectives: (1) computer science and (2) computer engineering.

In computer science, computer vision is the interdisciplinary field that develops theories and methods to allow computers to extract relevant information from digital images or videos. On the other hand, in computer engineering, it is considered as the interdisciplinary field that develops algorithms and tools to automate perceptual tasks normally performed by the human visual system.

Every image tells a story

One of the goals of computer vision is to perceive the “story” behind the picture. It includes developing mathematical techniques for recovering the three-dimensional shape and appearance of objects in imagery, recognising objects and people in a scene, or even finding out what is happening in an image.

Human perception has its shortcomings

I will show you some optical illusions and tease what they might tell us about the human visual system.

Which side of the object is brighter?

You may perceive the right side to be brighter. However, notice that both sides appear to have the same colour if you cover the vertical line as shown below.

In the image below, are the cells popping in or out?

What happens when the image is rotated?

Which tile is darker in the image below? A or B?

Tiles A and B actually have the same absolute intensity value (as shown below). Your perception is due to brightness constancy where the visual system’s attempt to discount illumination when interpreting colors.

A variation of Hermann grid illusion from Hany Farid is shown below. As you move your eyes over the figure, gray spots appear at the intersections.

Perceptual psychologists have spent decades trying to understand how the visual system works. More examples of unbelievable optical illusions can be found here.

Computer Vision: Applications

You may ask, “Have I ever used computer vision? How? Where?”

Over the last two decades, computer vision research has seen rapid development and is being used today in a wide range of real-world applications, which include:

Optical Character Recognition (OCR) — it allows you to convert images of text into text. It also allows you to search for hand-written characters in the Notes app on your iPhone.

Object Detection — it allows you to search for a specific person or object (cat) in your iPhone’s photo library.

iPhone Portait Mode — it uses the phone’s camera to create a depth-of-field effect. It lets you compose a photo that keeps your subject sharp while blurring the background.

iPhone FaceTime Attention Correction — it corrects your gaze to the camera while you give attention to the screen.

Amazon Go “Just Walk Out” Technology — it automates much of the purchase, checkout and payment steps associated with a purchase transaction.

Vision-based Biometrics — How the Afghan Girl was Identified by Her Iris Patterns” Read the story.

3D Human Face Capture — learn about the latest research in digital human technology.

Motion Capture for Visual Effects — discover the technology behind shooting the motion-capture scenes on location versus on a sound stage.

Self-driving Cars — they are capable of sensing their environment and moving safely with little or no human input.

Image Synthesis — Playform is a new way for visionary artists, creators, and designers to experiment, explore and inspire with AI. I played around with the app, drew a landscape and generated images shown below. My sketch has been transformed into different images based from reference styles (e.g. Baroque) or artists (e.g. Vincent Van Gogh). A similar application is GauGAN from NVIDIA Research.

Image-guided Surgery — learn how image-guided surgery (IGS) technology is used in the operating room.

Augmented/Mixed/Virtual/Extended Reality — Philips is piloting mixed reality in the domain of image-guided minimally invasive procedures.

Although not exhaustive, the list has demonstrated the incredible use of computer vision in a wide range of application areas. This is a very active research area where breakthroughs happen almost every year.

Conclusion

Computer vision tasks generally include:

(a) obtaining simple inferences from individual pixel values

(b) grouping pixels to separate object regions or infer shape information

(c) recognising objects using geometric or statistical pixel information

(d) combining information from multiple images into a coherent whole

Critical issues in computer vision:

(a) sensing — how do sensors obtain images of the world?

(b) encoded information — how do images yield information of the scene, such as colour, texture, shape, motion…?

(c) representation — what representations are appropriate?

(d) algorithms — what algorithms are appropriate to process image information and construct scene descriptions?

While computer vision is almost entirely digital image processing on a low level, it is about knowledge construction, representation and inference on a high level.

(a) recognition — identify objects based on low-level information

(b) interpretation — assign meaning to groups of recognised objects

(c) scene analysis — complete understanding of the captured scene

Acknowledgements

This is a part of a series of articles that I am writing about Computer Vision. Some included images were taken from books (by Richard Szeliski, Ballard and Brown, Shapiro and Stockman) and lecture notes from Brown University, Cornell University, University of Michigan, University of New South Wales and University of California, Berkeley. Original sources are credited where possible.

--

--

Ryan Anthony J. de Belen
The Startup

PhD Candidate at UNSW Sydney | Human-Computer Interaction | Artificial Intelligence | Translational Psychiatric Research