Mapping of Human Visual System and CNN
Do you have the question how the CNN is related to Human Visual System or Brain?
If YES, then here is my try to answer it according to my study, understanding and mapping of concepts.
In this blog I will try to connect human brain and the layers of CNN as Edge detection, Max Pooling, ReLU, Data Augmentation, Dropout and a variant of CNN Residual Network(ResNet).
Hope you enjoy it!!!
Before diving into the mapping of the concepts, lets know about the two Nobel Prize winning scientists who laid the foundation for the area of Computer Vision .
Research in Sensor Processing (1960’s and 1970's)
Dr. Hubel and Dr. Wiesel worked on the area of Sensory Processing. In which, they inserted a micro-electrode into the primary visual cortex of an partially anesthetized cat so that she can’t move and shown the images of line at different angles to the cat.
Through the micro-electrode they found that some neurons fired very rapidly by watching the lines at specific angles, while other neurons responded best to the lines at different angles. Some of these neurons responded to light and dark patterns differently, while other neurons responded to detect motion in the certain direction.
This work is prime for the concept of CNN.
Where is Visual Cortex Located in Humans Brain?
Visual Cortex is the part of the Cerebral Cortex of the Brain that processes the visual information. Visual nerves from the eyes runs straight to the primary visual cortex. Based on the structural and the functional characteristics it is divided into different areas, as shown in the following picture:
Visual Cortex : Functions
The visual information is passed from one cortical area to another and each cortical area is more specialized than the last one. The neurons in the specific field only respond to the specific actions.
Some of them with their functions are as follows:
- Primary Visual Cortex or V1 : It preserves spatial location of visual information i.e. orientation of edges and lines. It is the first one to receive the signals form what eyes have captured.
- Secondary Visual Cortex or V2 : It receives strong feed-forward connections from V1 and sends strong connections to V3, V4 and V5. It also sends strong feedback network to V1. It’s function is to collects spatial frequency , size, color and shape of the object.
- Third Visual Cortex or V3 : It receives inputs from V2. It helps in processing global motion and gives complete visual representation.
- V4 : It also receives inputs from V2. It recognizes simple geometric shapes and also forms recognition of object. It is not tuned for complex objects as Human Faces.
- Middle Temporal (MT)Visual Area or V5 : It is used to detect speed and direction of moving visual object i.e. motion perception. It also detect motion of complex visual features. It receives direct connections from V1.
- Dorsomedial (DM) Area or V6 : used to detect wide field and self motion stimulation. Like V5 it also receives direct connections from V1. It has extremely sharp selection of the orientation of visual contours.
Mapping to CNN
The above visual cortex acts as layers of the CNN. Lets take scenarios as Edge Detection, Face Detection, In-variance detection (i.e. Rotated Face Detection, Large or Small Face Detection)
Edge Detection : Using convolution operation on image with Sobel Kernel we can detect the edges. Look at the following image:
Max Pooling : It is used to detect, where the objects are located in the Image based on the output of the each cluster of neurons in previous layer. As the face is detected where ever it is; doesn’t depend on the location of face in image.
ReLU (Rectified Linear Unit) : As human brain never stops learning, it (brain) always learns from the observations and experiences; i.e. the inputs which it receives form the sensory organs, are utilized at some or another point; but the learning never becomes “Zero”. To add this feature to the neural networks ReLU is used. The activation function is : f(x) = max(0,x). For any activation function, we must be able to take the derivative of that function and with ReLU we can do that. But the derivative at zero is not defined for the ReLU. Due to zero we can have the problem of dead activation state. This implies there will be no weight change meaning no learning. But in humans it doesn’t happens often. To tackle this problem the concept of Leaky ReLU is used.
Leaky ReLU : The function is : f(x) = if (x > 0) then x else 0.01*x. With this we are avoiding the problem of dead states. That is the network can continue to learn; but it can face the problem of Vanishing gradient.
Data Augmentation : We humans can recognize the face even if it is inverted, rotated,flipped, reflected or skewed. Using Data Augmentation technique we can convert a single image into different types of image and use the newly formed images for training the CNN. After that, the CNN will be able to detect In-variance based data such as rotated faces, large and small faces, flipped faces etc (i.e. the objects will be recognized even if they are not in their original position).
Dropouts : Does all the neurons present in our brain fire to learn something? The answer is ‘’NO’’. It is not necessary that they fire in a linear fashion or in back-propagation. Some of the neurons may stay inactive in one phase of learning and may get active in another phase of learning or vice-versa. This gives the capability of independent learning to the neurons. To have this in the networks, the concept of dropouts is introduced. After applying dropout with probability p, the randomly selected individual nodes/neurons are dropped out of that epoch for learning process and the respective incoming and outgoing edges are also dropped out. It is much used to avoid the over-fitting in the network.
Residual Network (ResNet) : As we have seen that V5 and V6 receives direct connections from V1; in the same way the Residual Network works. It skips connections and /or jumps over the layers. It is done to avoid the problem of Vanishing Gradient.