Introduction to Convolutional Neural Networks

Meghna Asthana PhD MSc DIC
Analytics Vidhya
Published in
2 min readFeb 28, 2020

--

Convolutional Neural Network for MNIST Handwritten Digits Classification

The neural networks find their inspiration from the biological neuron which acts as a fundamental unit of the nervous system of living beings. The first attempt to model the biological neuron is the Perceptron which is a linear model for binary classification. It consists of input layers with associated weights which are summed up and send to a step function with a definite threshold, typically a Heaviside step function with the value of 0.5. The net input to a neuron is the weights on the connections multiplied by the activation incoming on the connection. A bias term is added to incorporate to account for bias in every layer. The final output from a neuron is the value of the activation function wrap around the net input value. [1]

The Convolutional Network is an architecture which is suitable for two-
dimensional array data and finds inspiration from its biological counterpart [2] where the architecture involves the processing of units with identical weight vectors and arrangement of local receptive fields in a spatial array. Their hierarchical architecture encompasses alternating subsampling layers which are analogous to simple and complex cells in the primary visual cortex [3]. CNNs perform mappings between spatially / temporally distributed arrays in arbitrary dimensions and are generally characterized by the following constraints [1]:

  1. Translation invariance: spatial translation has no effect on the neural weights
  2. Local connectivity: neural nodes which are located in spatially local regions have connections
  3. A progressive decrease in spatial resolution: when there is a gradual increase in the number of features

A classic CNN encompasses alternating layers of convolution and pooling. The convolution layers are tasked to extract patterns in the images which are located in a particular region. This is achieved by computing the inner product of an arbitrary convolving filter and every region of the image to obtain a feature map which is passed through a non-linear function generating activations that are further processed in the pooling layer. The most commonly used pooling functions are average and max-pooling which select the arithmetic mean and maximum of the elements in a particular pooling region, respectively. The alternating convolution and pooling layers extract varied features at each step. Succeeding this is the non-linear function which can be chosen as tanh, logistic, softmax or relu. The final layer is the fully connected layer which outputs unit class in a recognition task [3].

[1] Tan, Y.H. and Chan, C.S., 2019. Phrase-based image caption generator with hierarchical LSTM network. Neurocomputing, 333, pp.86–100.

[2] Patterson, J., 2017. Deep Learning. 1st edition. ed.: O’Reilly Media, Inc.

[3] Boden, M., 2002. A guide to recurrent neural networks and backpropagation. the Dallas project.

--

--

Meghna Asthana PhD MSc DIC
Analytics Vidhya

Computer Vision for Earth Observation @ Turing | CEFAS | BAS | NHM | UniCam | NERC