Computer Vision: The Reinvention of the Eye

Image: TechCrunch, 2017

Man has senses to observe the world around him. Computer Vision is actually the better version of our human senses. Computer Vision is capable of making a deep analysis of images or a series of them in just a few seconds. This is one of the most complex processes which we ever attempted to comprehend. Inventing a machine that owns senses which are even better then our own, is something magical. The most impressive thing is, that we actually do not exactly know by what kind of complex process we created this kind of super creatures (BMVA, 2017).

Computer vision can analyze, is able to extract automatically, and will understand useful information from just a glimpse of a single image or a sequence of images. This mechanism is based on a theoretical framework of trained algorithms. This process works roughly the same as by humans. The algorithms are trained to go quick through a classification of images, objects, colours, sizes etc, this takes place in a tiny fraction of a second.This classification is similar to our own visual cortex, which contains a framework of references to things “we already know.” The path that the algorithm is going to follow is actually already determined, but it contains an infinite number of possibilities. There is almost no conscious effort, because it has been programmed precisely. This means that the system cannot fail in theory, and the re-creation of the ultimate human vision today is thus based on the trust of preprogrammed series and sets of patterns which relies on each other (Coldewey, 2017). 
 
 It is striking that the cameras of computer vision devices are not that much better than simple 19th century pinhole cameras. Computer vision is all about the “mean shift algorithm” which is the so-called “Camshift” that forms the mediator programmed on robust statistics and operates on probability distributions (Bradsky, 2). The movement amount of the camera has to be very sensitive to the frame rate in order to recognize complex patterns. Computer-rendered graphics or game scenes with simple views (like a blue sky) are rendered much faster than complex views (like cities). It is important that the final rate movement should not depend on the complexity of a particular 3D view, to overcome this, they use empirical observations (Bradsky, 8). Sets of neurons excite one another in contrasts, the higher level network aggregate these patterns into meta-patterns: this complementary process creates the image with the required descriptions. The development process of computer vision is a collaboration between computer scientists, engineers, psychologists, neuroscientists and philosophers, who jointly determine the working definition of our mind.

Computer vision is implemented today in self-driving cars, it is in factory robots, and in your personal smart phone. But the possibilities of these devices are limited. Labsix, a group of MIT students recently published a research paper. They have developed an algorithm that is fixed to deceive image classifiers, from these “errors” they develop new programmings to optimize computer vision. They analyze how the system makes decisions. Common images are sabotaged by the “contradictory” algorithm because the pixels are minimally changed. The algorithm preserves exactly the right combination of sabotaged pixels during the process, but the small changes in the pixels cannot be read by the system. There are plenty of research tests to counter adversarial examples, computer vision will not be trusted until adversarial attacks are impossible, or at least hard to pull of (Snow, 2017).

The challenge with computer vision today lies in the development towards the source of stimulus. We know how our mind works, and how we can implement it in systems, but can we also trigger it to self-define the environment full of impulses? The future of computer vision is in integrating more specific powerful features of our human brain. The focus lays on abstract concepts such as context, attention and intention (Coldewey, 2017).This is necessary to fully rely on the system. The programming is currently limited to the established patterns that interact. The real spontaneity of situations is therefore still incalculable.

References:

BMVA. 2016. The British Machine Vision Association and Society for Pattern Recognition. 29–01–2018. http://www.bmva.org/visionoverview

Bradsky, Gary. R. “Computer Vision Face Tracking For Use in a Perceptual User Interface.” CiteSeer Vol 17, Issue 1. (1998): 3–17. 
 
Coldewey, Devin. “WTF is Computer Vision?” TechCrunch. 2016. Tech Crunch: Amazon, Tesla, Microsoft. 29–01–2018. https://techcrunch.com/2016/11/13/wtf-is-computer-vision/

Snow, Jacky. “Computer Visions Algorithms Are Still Way Too Easy to Trick.” Technology review. MIT Technology Review. 29–01–2018. https://www.technologyreview.com/the-download/609827/computer-vision-algorithms-are-still-way-too-easy-to-trick/