Beginner’s Guide to Computer Vision

Connectedreams’ Resources
Nov 18, 2016 · 10 min read


If We Want Machines to Think, We Need to Teach Them to See.-Fei Fei Li, Director of Stanford AI Lab and Stanford Vision Lab

The phenomenon that makes machines such as computers or mobile phones see the surroundings is known as Computer Vision. Serious work on re-creating a human eye started way back in 50s and since then, we have come a long way. Computer vision has already made its way to our mobile phone via different e-commerce or camera apps.

Think of what more can be done by machine when they will be able to see as accurate as a human eye. Human eye is a complex structure and it goes through more complex phenomenon of understanding the environment. In a similar fashion, making machines see things and make them capable enough to figure out what they are seeing and further categorize it, is still a pretty tough job.

Working on Computer Vision is equivalent to working on millions of calculations in the blink of an eye with almost same accuracy as that of a human eye. It is not just about converting a picture into pixels, and then try to make sense of what’s in the picture through those pixels, you will have to first understand the bigger picture of how to extract information from those pixels and understand what they represent.

So, let’s understand how do machines see?

B. Image Segmentation: Computers are made to identify similar group of colors and then segment the image i.e. distinguish the foreground from background. The technique of color gradient is used to find edges of different objects.

C. Finding corners: After segmentation, images are then looked up for certain features, also known as corners. In simple words, algorithms search for lines that meet at an angle and cover a specific part of the image with one color shade. Features, also called corners are the building blocks which help to find more detailed information contained in the image.

D. Find textures: Another important aspect to identify any image correctly is to determine the texture in the image. The difference in textures between two objects makes it easier for a machine to correctly categorize an object.

E. Make a guess: After implementing the above steps, a machine needs to make a nearly-right guess and match the image with those present in the database.

F. Finally, see the bigger picture! At last, a machine sees the bigger and clear picture and checks if it was right identifying the one, as per the feeded algorithmic instructions. The accuracy has improved a lot in past years but still, machines make mistakes when asked to handle images with mixed objects.

2. Universities That have Computer Vision Research Groups:

USA Universities

University of California Los Angeles

University of North Carolina at Chapel Hill

University of Washington

University of California Berkeley

Stanford University

Massachusetts Institute of Technology

Cornell University

University of Pennsylvania

University of California Irvine

Columbia University

University of Illinois at Urbana-Champaign

University of Southern California

University of Michigan

Princeton University

University of Rochester

University of Texas at Austin

University of Maryland College Park

Brown University

University of Central Florida

New York University

Michigan State University

University of Massachusetts, Amherst

Northwestern University

University of California San Diego

Universities in Canada:

University of Toronto

University of British Columbia

Simon Fraser University

Universities in Europe:

University of Oxford (

ETH Zurich

Max Planck Institutes, Germany

University of Edinburgh

University of Surrey

University of Freiburg

KTH Sweden

TU Dresden

TU Darmstadt

EPFL, Switzerland

KU Leuven

Computer Vision Center Barcelona

IDIAP Switzerland

Imperial College London

HCI Heidelberg

University of Manchester

University of Bonn

RWTH Aachen University

University of Amsterdam

TU Munich

Czech Technical University

University of Cambridge

TU Graz

IST Austria

Queen Mary University of London

University of Zurich

TU Delft

University of Leeds

University of Bern

Lund University

University of Trento, Italy

University of Florence, Italy

University of Stuttgart

Saarland University

Ecole Centrale Paris

Ecole des Ponts ParisTech

University of Oulu

Karlsruhe Institute of Technology

3. If you’re starting out in the field of Computer Vision, find below an exhaustive list of topics one must know.

Mathematics :

  1. Linear Algebra
  2. Singular Value Decomposition
  3. Introductory level Pattern Recognition
  4. Principal Component Analysis
  5. Kalman filtering
  6. Fourier Transform
  7. Wavelets

Image Processing:

  1. Online Course offered by Duke University on Coursera
  2. Digital Image Processing by Gonzalez and Woods

To gain practical knowledge about how things work especially the algorithms, start learning about OpenCV from Computer Vision perspective:

Tip: When programming in C, C++, Python we use OpenCV library for computer vision. When programming in MATLAB, we use computer vision system toolbox. Similarly there are more open source libraries if you are programming in other languages.

You should also know about the keywords or key works done in the field and here is where you can learn them from :

  1. SIFT: classic descriptor for general-purpose vision
  2. HOG: well-known descriptor that is particularly good for human detection
  3. Viola-Jones: great face detector
  4. Shape Contexts
  5. Deformable Part Models

A list of must-read books include:

1. Computer Vision: Algorithms and Applications

2. Computer Vision : A Modern Approach By David A. Forsyth, Jean Ponce

3. Multiple View Geometry in Computer Vision By Richard Hartley, Andrew Zisserman

Advanced level — Towards Deep Learning

4. Michael Nielsen’s “Neural Networks and Deep Learning” online book; it’s a really great, gentle introduction: Neural networks and deep learning

5. Deep Learning book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville

What will happen when machines can sense your emotions?

TED Talks to watch:

2. Blaise Agüera y Arcas: How PhotoSynth can connect the world’s images

3. Chieko Asakawa: How new technology helps blind people explore the world

4. Jennifer Healey: If cars could talk, accidents might be avoidable

5. Golan Levin: Art that looks back at you

6. Paul Debevec: Animating a photo-real digital face

7. Golan Levin: Software (as) art

Online Courses to go for:

  1. Udacity : Introduction to Computer Vision
  2. Stanford’s CS231n: Convolutional Neural Networks for Visual Recognition
  3. University of Central Florida — Prof. Mubarak Shah’s Video lectures
  4. Apply all your knowledge on concepts and algorithms gained from aforementioned resources to solve a few assignments and do a project on your own.

Advanced Level — Towards Deep Learning

  1. Geoff Hinton’s Neural Net lectures on Coursera
  2. Stanford course: Deep Learning for Natural Language Processing
  3. Stanford course: Convolutional Neural Networks for Visual Recognition

Seminar Courses:

  1. Deep Learning in Computer Vision (Prof. Sanja Fidler)
  2. Advanced Computer Vision (Prof. James Hays)

4. Projects Around The Globe

a. Microsoft computer scientists and researchers are working to “solve” cancer

b. Project Tokyo — deliver AI-enabled prototypes that augment awareness of social, physical and textual environment for people who are blind or have vision impairments.

c. Teaching machines to predict the future

The left-most column shows the frame before the action begins, with the algorithm’s prediction below it. The right columns show the next frames of the video.

Another way to keep yourself aware of the research being done in Computer Vision is to follow authors and read their papers from top conferences such as CVPR, ICCV, ECCV, BMVC.

5. Conversation with Experts

Conversation with Prof. Devi Parikh | Visiting Researcher at Facebook AI Research | Assistant Professor at Georgia Tech (Previously at Virginia Tech)

Computer Vision is a subfield of Artificial Intelligence where the goal is to build a computer replicating the visual intelligence of human brain. Machine Learning is a generic term for teaching machines anything, but Computer Vision specifically deals with visual data. In Machine Learning, we deal more with statistical tools whereas Computer Vision could include both — statistical as well non-statistical tools. For instance, 3D reconstruction in Computer Vision field tends to use machine learning tools less frequently than say image classification and object recognition. Many computer vision tasks have their own needs for which we develop specific machine learning tools.

For any student to start learning about the field, I’d advise them to pick a problem by going through researchers’ web pages and selecting one problem they find interesting. Mostly people are working on cutting edge problems for which standard datasets are available out there that could be used. They can select a problem, a dataset, as well as a library they might want to use and get their hands dirty.

When taking masters or PhD students, what I usually look for is — accountability, pro-activeness, and determination. Have your basic concepts clear about the field. Try to read research papers. Try to get a sense for the problems at the frontiers of AI that researchers world-wide are working on. And get your hands dirty.

B. Conversation with Richa Agrawal | University of Pennsylvania Alumnus | Computer Vision Research Engineer at Whodat

I graduated from MNIT Jaipur and while studying there I got in touch with the Robotics group. We did a few projects and went on to participate in a national level competition at IIT Roorkee. We won the competition and that boosted my morale. After completing my bachelor’s, I started working at Yahoo. I realized that this is not something I wanted or want to do and hence, went for my master’s at University of Pennsylvania. I explored different research areas during that time by taking different courses and finally decided Computer Vision as my main research interest. After graduating, I worked at a startup in the US and was looking for similar opportunity in India as the field started growing even here. At Whodat, a Computer Vision startup based out of Bangalore, we do stuff with Augmented Reality and Visualization. For instance, say, you’re planning to buy furniture for your home; you go to a shop and choose one after visualizing it in your home environment. After the furniture gets delivered, you realize that either it is too big or too small but nothing can be done about it now. We are trying to help you by building a solution that will let you visualize furniture at your home. This will enable you to make better decisions and hassle free purchase of items.

When studying, many a times I came to a point where I was not able to give my best and used to feel demotivated but then an advice from a friend came to the rescue. He told me that –‘ there are only a few people (less than 0.1%) who are able to make it to this point (doing master’s from abroad and that too in a technical field like Computer Vision) and you have already proved that you’re one of them. And, you just need to push a little harder. Only you can do it for yourself and nobody else will do it. And at the end, only your learning is what matters the most.‘

Some suggestions for students to get started is to talk to their peers in other colleges and ask about what kind of projects they do. Then they can form a team with a leader and start experimenting. I’d also recommend participating in competitions and hackathons. It is highly important to find your interests and go with them instead of working in an area you don’t like. Computer Vision, for instance, is a great area with a huge scope of development in India as in this field, all you need is a camera which has started penetrating to even smaller cities now. So, the future of Computer Vision is definitely bright.

Apoorva Bhalla| Content & Marketing Fellow at

Connectedreams Blog Data-Driven Networking Platform, Bridging The Mentorship & Role Model Gap.

Written by

Bridging The Mentorship Gap.

Connectedreams Blog Data-Driven Networking Platform, Bridging The Mentorship & Role Model Gap.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade