LeNet Architecture

Introduction to the concept used in Digit Recognition

Published in

Analytics Vidhya

4 min readSep 21, 2020

LeNet is a big breakthrough in the world of image recognition. It is one of the oldest convolution neural networks that was introduced by Yann LeCunn back in 1995 in his research paper. During those days he came up with this LeNet Model to find the handwritten digits representing the Zip codes of the US postal service.

Components of LeNet Structure:

1. Input image of a defined size

2. Convolutional layer used with a kernel or feature of particular size and padding

3. Filters used in Convolutional layers as per requirement

4. Average Max pooling is used with specific size, strides, and padding

Convolution layer 1

It’s constructed using multiple convolutions and average pooling layers. We take an input grey-scale image of size 32*32 consisting of digits as images. We introduce a kernel of size 5*5 with padding as 0 and convolve it with the input image. We use 6 filters or kernels to generate the convolutional layer of 28*28*6. The image stride is taken as 1.

Input image = 32*32

Kernel size = 5*5

No of kernels = 6

Padding = 0

Stride = 1

So size of conv1 = [n+2p-f+1] / s = [32+0–5+1/1] = 28

Hence the conv1 = 28*28*6

Average Pooling 1

Now we do average pooling with 2*2 kernel size and strides=2. It results in decreasing image pixels into 14*14.

Input image = 28*28

Kernel size = 2*2

Stride = 2

So size of conv1 = [n+2p-f+1] / s

= [28+0–2+1/2]

= floor value of (13.5)

= 14

Hence the avg-pool 1 size = 14*14*6

Convolutional Layer 2

Now we generate a convolutional layer with 5*5 kernel size, taking kernels as 16 and strides=1. It results in a reduction of image pixels into 10*10*16.

Input image = 14*14

Kernel size = 5*5

No of kernels = 16

Padding = 0

Stride = 1

So size of conv2 = [n+2p-f+1] / s = [14+0–5+1/1] = 10

Hence the conv2 = 10*10*16

Average Pooling-2

Now we do average pooling with 2*2 kernel size and strides=2. It results in decreasing image pixels into 5*5.

Input image = 10*10

Kernel size = 2*2

Stride = 2

So size of conv1 = [n+2p-f+1] / s

= [10+0–2+1/2]

= floor value of (4.5) = 5

Hence the avg-pool-2 size = 5*5*16

Inputs to connected layer-1

Now after the 2nd average pooling we unroll the total pixels into inputs neurons of the connected layer-1. We provide the 400 neurons as inputs to connected-layer1.

Generation of 2nd, 3rd connected layers, and a softmax output.

We receive 120 inputs to connected layer-2. These inputs are processed and generate 84 inputs to connected layer 3. The 84 inputs combine together into a softmax which generates the output classifier consisting of 10 labels. The softmax generates 10 outputs as we have to identify the 10 digits from 0 to 9 during document recognition.

Mnist is a dataset of digit images that consists of around 70000 images out of which 60k are reserved for training the model and 10k are reserved for testing the model. We create a LeNet model and apply it to the Mnist dataset in order to classify the digits based on their identification.

So initially we pull the Mnist dataset which is available in the Keras model. Then we split the dataset into training and test model and save the details of the output labels or classes in the NumPy arrays.

We then create a Lenet Architecture which will classify the input images using the label classification of the predicted image. In the Lenet Architecture, we create a Keras model object and apply it to the Keras Sequential model.

We give the input images to the architecture as train and test images and train the model on the Lenet Architecture to classify the images based on the inputs received. We can reach the accuracy of the model by training using the input images by increasing the epochs.

After the training of the model, we pass the input to the model in order to test the prediction based on the test image. Here we pass a test image as 9 and we receive an output of 9 after application of class prediction by model on the image.