Touch (Not Vision) is the Foundation of Human Cognition

Photo by xandtor on Unsplash

A vast majority of AI researchers are looking at the wrong place when it comes to developing general intelligence. You see, the missing link is not to be found in vision but rather it is to be found in our sense of touch. Studying vision is not the real problem, the real problem is how we study vision. Research in image understanding assumes that the human mind captures the world like a camera. This is not even remotely true. As you read this text, you are busy understanding the meaning of the words, but you are ignoring the shape of each word. The mind takes snapshots of attention and integrates it to a whole. Inattentional blindness (Did you see that Gorilla?) demonstrates that humans are unable to see what is outside of their attention.

Previously, I discussed ‘Ego-motion in Self-Aware DL” where I explored research into self-aware vision systems. In the research mentioned in that discussion, all were exclusively based on binocular vision. But in general, research in autonomous intelligence cannot ignore the perspective of the subjective self. In this regard, at the most basic level, the bodily self is present. Thus I proposed the following curriculum:

Here, touch is learned before perception. Perception is learned through the transfer of skills that come from touch. The cognitive mechanisms of touch greatly influence how our vision systems work. This may perhaps not be apparent, but our vision systems work in a manner that is very similar to touch. Previously, I proposed structuring a curriculum inspired by infant cognitive behavior. Interestingly enough, we have an extremely limited understanding of what behaviors of touch are learned by an infant. Most cognitive milestones appear to be based on vision.

Previously I also discussed the difference between human and Deep Learning vision. The optical illusion by Victoria Syke illustrates the peculiar behavior of our perception:

https://twitter.com/victoria1skye

The darker blue is not parallel in the illusion above. One can argue, the reason we see it this way is that our vision works the same way as touch. It is as if we are touching with our eyes this illusion and we are recognizing the illusionary gradient of the image. It is the same illusion found in our sense of hearing with the auditory illusion known as the Shepard Tone. We construct our word through a sequence of experiences.

Perhaps one reason why researchers shy away from the sense of touch is because of its complexity relative to vision systems. We have overly simplified the sensory input of vision to that of a two-dimensional grid composing of pixels encoded in RGB. We don’t really have similar models for touch. Furthermore, we don’t have machines that do capture touch (like we have cameras) and therefore we don’t have any data to use for training. So, this just leads to a state of neglect in this area.

The human body has many sensory receptors that enable you to feel. These are not only located at the skin and can be found in muscles, joints, vessels, and organs. These receptors respond to light touch, pressure, stretching, warmth, cold, pain, and vibration. Collectively, they form a complex experience of your inner body and your environment.

There are many more touch and pressure sensors in your fingertips. This allows you to recognize the finer details of objects. There are stretch receptors in your muscles and joints that provide information about the location of your arms and legs. These stretch receptors are also connected to internal organs such as your lungs (monitoring your breath), your stomach (feeling full) and your bladder. There are even pressure receptors in your arteries that allow the monitoring of your blood pressure.

Related to this a study of consciousness has created a map of 100 subjective feelings and it is reflected by a sensation that is felt in different parts of the body:

http://www.pnas.org/content/115/37/9198

There is now increasing new research in AI that involves the bodily self. We can learn from these research methods to bootstrap our understanding of this extremely important aspect of cognition. Research on cognition and the bodily self is a first step that gives proper direction for a quest toward general intelligence.

Jeff Hawkins and his team at Numenta have recently recognized the importance of touch. In recent talks by Hawkins, he narrates his inspiration:

To illustrate this concept, imagine touching a coffee cup with one finger. As you move your finger over the cup, you sense different parts of it. You might feel the lip, then the curve of the handle, then the flatness of the bottom. Each sensation you receive is processed relative to its location on the cup. The curved handle of the mug is always in the same relative position on the cup, it is not a feature relative to you. At one moment it might be on your left and another moment on your right, but it is always in the same location on the cup. If you were asked to reach into a box and identify this object by touching it with one finger, you probably couldn’t with a single touch. But if you continued to move your finger over the object, you would integrate more input, until you recognized with certainty that the only object containing this set of features at these locations is the coffee cup.

Numenta has come to the realization that they’ve been overlooking touch for all too long and subsequently proposed a new model based on this insight:

Numenta has surprisingly decided to drop everything else that they were doing and are now focusing exclusively on this newer model of cognition. This is a good tell that this idea may have potential.

Vicarious, another AGI firm described new embodied cognition research “From Action to Abstraction: Learning Concepts through Sensorimotor Interactions” where they explore a simple two dimensional visually impaired agent that goes around exploring its world through the process of bumping into things. They have some very insightful conclusions:

We compared our active approach to a passive approach, using a CNN trained to detect whether a concept held, based on a static view of the entire environment. For concepts involving containment, the interactive approach clearly outperforms the CNN. For concepts involving distinguishing objects of differing shapes or spatial relations, we found that the CNN performed better in some cases and worse in others.

There is clearly something about interacting with an environment that conveys an understanding that passive vision alone cannot achieve. Numenta’s research hints at a general approach that is inspired by recognizing an object through multiple touch sensors. Vicarous demonstrated that there is a significant difference in conceptual understanding that is based on touch as compared to vision.

A new paper from Georgia Tech explores a task that humans perform every day and requires a sense of touch (see: “Learning to Dress”). This mundane task may not seem as impressive as becoming a Go world champion. The methods learned in this research may perhaps be equally relevant. This is because this mundane task is extremely difficult to automate.

The difficulty begins with our lack of models for touch. Furthermore, the clothing need to be simulated and navigating its contours and topology via haptic sensors appears to be extremely challenging. This problem is so different from other Deep Learning problems that its achievement is quite surprising.

Touch indeed is an undiscovered territory that any new research in this topic will likely have an outsized impact towards developing general intelligence. Vision research is an overly grazed area and most of the low hanging fruit has been picked. However, the recognition that vision is like touch should give researchers newer ideas that have yet to be tried. The word ‘feel’ in our human vocabulary is associated to our sense of touch rather than our other senses. This should be enough of a tell that our understanding of the world is based on our feel of the world rather than our perception of the world. Passive vision is a problem that does not fully capture the mechanisms of human understanding.

O’Regan writes in “Why Red doesn’t sound like a bell”:

The idea is to take the stance that feel is not something that happens to us, it is a thing we do. The quality of a sensory feel is the quality of our way of interacting with the environment. As a quality, it is something abstract and therefore not something that can be generated in any way, let alone by the brain. The role of the brain in sensory feel is not to generate the feel, but to enable the modes of interaction with the environment whose quality corresponds to the feel.

At the foundation of human cognition is how we interact with our world. Perception is not passive, but rather it is an active conversation and this is fundamentally how we can ‘feel’ the world.

Further Reading

Explore Deep Learning: Artificial Intuition: The Improbable Deep Learning Revolution