Developing an intuition for better understanding of convolutional neural networks
--
The Human Eye
The best visual tool that man has known for most of time is the human eye. It is the primary tool used by our brain to perceive the world and make sense it of it. While the basics of the human eye are explainable by optics, it is how that raw stimulus invokes imagery in our brain is what confused and fascinated people for a long time. It was in the late 1950s that two professors David Hubel and Torsten Wiesel, who were experimenting by inserting electrodes in the visual cortex of a cat and observing individual neurons[more].
They discovered that each neuron was tuned to observe only one kind of stimulus (say a straight vertical line) and only moving (or stationary) in one direction. Further they observed that the firing of neurons change as the orientation of the stimuli was changed. This understanding of each neuron as looking out for different stimuli in different orientations was later used in building convolutional neural networks.
Convolution : the function
Before anything else, one needs to understand why this function is used in the neural networks. To bridge that gulf of understanding, one first needs to realise that processing in the brain is mostly hierarchical. As mentioned above, we see that the visual cortex has a bunch of neurons which are tuned to detect very specific stimuli. But, how do these outputs make sense, i.e. say how can two neurons detecting a horizontal and vertical line, combine their observations to show the presence of a curved one? The answer lies in the hierarchical organization. Initially the V1 gets all the activations from the neurons detecting their stimuli, then these are sent to a different set of neurons in the same cortex to detect higher level features. Basically, the information captured by the first ‘layer’ of neurons is passed to the next ‘layer’ for detecting more complex features, one of which could be curves.