Seeking more in Artificial Sight
Research and write about its history and what the open or current problems are in using machine learning for this project.
Computer vision has been a field tracing its origins back to MIT in 1963 when Roberts (a PhD student then) analyzed a computer’s perception of varioud three-dimensional models (see image below) in his thesis. Roberts limited his research to primary geometric solids (e.g., triangle, cylinder, etc.). Clearly, objects in real-life are not primary geometric shapes. Rather, they are often occluded by ornamentation, other objects, and the like.

Machine learning is an interesting technique to apply to our problems of computer vision, in particular, to recognizing various objects with a computer. However, there are several challanges to this:
- A single object can appear very different based on imaging conditions (e.g., position of the object relative to the camera, lighting, etc.)
- A single image may include many different objects (think of a Find Waldo page and how many sub-images our brain has to shift through to find Waldo — and in that case we know what we are looking for, which may not always be the case in a computer vision problem.
- There is a LARGE number of categories for images. For example, the University of North Carolina provides a sample flowchart for categorizing different object. Notably, this flowchart doesn’t bother categorizing different types of plants.

Additionally, an object category may appear slightly different, which is referred to as “Within-class Variation”. Take, for example, different types of chairs. They might have different number of legs, different contour, different features (e.g., armrests, backrest, etc.). Yet, they are all still chairs.
In 1987, Biederman attempted to recognize objects by their components — a door may include only a single rectangular components, but a watering jar may include a cylinder portion (where the water is stored), a curved portion cylindrical portion (spout), and another curved portion (handle). Some objects, for example, plants, however, are harder to break into components. Even when they are broken into components, the relative position of the components may matter sometimes more than others.

In 1995, Zisserman instead tried to capture the general shape of the objects and to try to identify the object based on their general shape. This may work for objects that have distinctive outlines (like a guitar), but for object with a more generic outline (cereal box vs. jewelry box), the objects become had to distinguish. The image to the left illustrates the general outlines studdied by Zisserman.
In the late 90’s, people also began using sliding window approaches to try to determine what they were looking at. Sliding window approaches refer to “sliding a box” around the image and classifying these image crops. Researchers also began looking at various features (color, texture, etc.) and focusing on those features to obtain an idea of what the object might be.
Early 2000s, people began exploring images as a “bag of features.” An image would essentially be broken into various parts, and each part may be associated with a few words. In some cases, the words may then be used to determine which object the words might be referring to.
Later in the 2000’s, Hayes and Efros began exploring using image completion where the scene was attempted to be matched with another image accessed by the system or computer. Image parsing to determine different types of objects that might be present within a single image was also studied during that time.
The biggest challenge remains to a)determine exactly what object within the image is being searched for (e.g., or is of interest), b) what do we want to know about said object — do we need to know the color, the texture, the overall appearance, its use, or something else entirely. While face-recognition has continued to improve, we need to find a better way of categorizing the rest of the objects we encounter in our daily lives.
Note: images were found in the following link: http://www.cs.unc.edu/~lazebnik/spring10/lec16_recognition_intro.pdf