LeNet: The Deep Learning Model That Revolutionized Image Recognition

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

8 min readNov 4, 2023

In the realm of deep learning, LeNet-5 holds a significant place as one of the pioneering convolutional neural networks (CNNs). Developed by Yann LeCun and his colleagues in the late 1990s, LeNet-5 revolutionized the field of computer vision and laid the foundation for many modern deep learning architectures. In this article, we will explore the key concepts behind LeNet-5, its architecture, and its impact on the field of deep learning.

CNNs (Convolutional Neural Networks) are famous for image-related tasks due to their ability to learn hierarchical features, provide translation invariance, use parameter sharing, and local receptive fields. These are a subtype of feed-forward neural networks, have neurons that respond to localized regions within their coverage area, making them effective for handling extensive image processing tasks.They have specialized layers like convolutional and pooling layers, making them efficient for image analysis. Pre-trained models are available, and weight sharing reduces parameters.

1. Background and Motivation:

Before the advent of LeNet-5, traditional neural networks struggled to process complex visual data efficiently. LeCun recognized this limitation, proposed a solution by introducing convolutional layers, which allowed the network to learn local patterns and spatial hierarchies. This breakthrough in architecture paved the way for the development of LeNet-5.

ANNs (Artificial Neural Networks) are more general and can handle a wider range of data types, making the choice between CNNs and ANNs dependent on the specific problem and data.

Source: https://www.philschmid.de/getting-started-with-cnn-by-calculating-lenet-layer-manually

2. Architecture:

LeNet-5 consists of seven layers, including three convolutional layers, two subsampling (pooling) layers, and two fully connected layers. Each convolutional layer applies a set of learnable filters to the input data, extracting important features. The subsampling layers reduce the spatial dimensions, aiding in translation invariance. The fully connected layers act as the classifier, making predictions based on the extracted features.

LeNet stands out as one of the renowned CNN models. Generally, when we mention “LeNet,” we are typically referring to the LeNet-5 model, a simple and well-known convolutional neural network. LeNet-5 played a pivotal role in advancing the field of deep learning. The development and naming of LeNet-5 were the results of extensive research spanning from 1988 onwards.

In 1989, Yann LeCun and his team at Bell Labs were pioneers in applying the backpropagation algorithm to practical applications. They recognized that enhancing network generalization could be achieved by imposing constraints from the task’s domain. Their innovative approach involved training a convolutional neural network using backpropagation algorithms to read handwritten numbers and applying it successfully to recognize handwritten zip code numbers provided by the US Postal Service. This achievement laid the foundation for what would later be known as LeNet.

During the same year, LeCun also presented a study on a small handwritten digit recognition problem. He demonstrated that, despite the problem being linearly separable, single-layer networks exhibited poor generalization capabilities. However, by employing shift-invariant feature detectors within a multi-layered, constrained network, the model performed exceptionally well. This work underscored the significance of minimizing the number of free parameters in neural networks to enhance their generalization abilities.

3. Training and Optimization:

LeNet-5 utilizes the backpropagation algorithm to update the weights and biases of the network during training. The loss function used is typically the cross-entropy loss, and optimization techniques like gradient descent are employed to minimize this loss. LeNet-5 also introduced the concept of weight sharing, where multiple weight parameters are tied together, reducing the model’s complexity and improving generalization.

To grasp the concept of the proposed LeNet model, it’s important to note that LeNet incorporates fundamental components of a convolutional neural network, including convolutional layers, pooling layers, and fully connected layers. This serves as a foundational contribution to the future development of convolutional neural networks. In the below diagram, which features input image data with dimensions of 32x32 pixels, LeNet-5 comprises a total of seven layers. Notably, every layer apart from the input layer can be fine-tuned to learn parameters. In the diagram, “C1, C3, C5” denotes convolutional layers, “S2, S4” represents subsampling layers, “F6” corresponds to fully connected layers.

Layer C1 is a convolutional layer equipped with six 5x5 convolution kernels, resulting in feature maps sized at 28x28. This configuration ensures that the input image’s information remains within the boundaries of the convolution kernels, preventing any loss of information.

Layer S2 serves as the subsampling or pooling layer, generating six feature maps, each measuring 14x14. In this layer, every cell within each feature map establishes connections with 2x2 neighborhoods within the corresponding feature map from layer C1.

4. Impact and Legacy:

LeNet-5 made significant contributions to the field of deep learning, particularly in the domain of handwritten digit recognition. It achieved remarkable results on benchmark datasets like MNIST, showcasing the potential of CNNs in image classification tasks. LeNet-5’s success inspired subsequent advancements in deep learning architecture, leading to the development of more complex and powerful models.

Layer C3 is a convolutional layer consisting of 16 convolution kernels, each with a 5x5 dimension. The input for the initial six C3 feature maps is derived from continuous subsets of three feature maps in S2. The following six feature maps draw their input from continuous subsets originating from the input of four adjacent subsets, while the subsequent three feature maps are fed by discontinuous subsets. The last feature map is unique in that it receives input from all feature maps in S2.

Layer S4 closely resembles S2, maintaining a size of 2x2 and generating 16 feature maps of dimensions 5x5.

Layer C5 operates as a convolutional layer, featuring 120 convolution kernels, each sized at 5x5. Each cell within C5 establishes connections with the 5x5 neighborhood across all 16 feature maps in S4. It’s worth noting that because the feature map size in S4 is also 5x5, the output size of C5 is reduced to 1x1. This results in a complete connection between S4 and C5.

It’s labeled as a convolutional layer rather than a fully connected layer because in cases where the input to LeNet-5 increases in size while its structure remains unchanged, the output size of C5 would extend beyond 1x1, hence distinguishing it from a fully connected layer.

The F6 layer is fully connected to C5, generating 84 feature maps as output.

LeNet-5 layers:

Convolution #1. Input = 32x32x1. Output = 28x28x6 conv2d
SubSampling #1. Input = 28x28x6. Output = 14x14x6. SubSampling is simply Average Pooling so we use avg_pool
Convolution #2. Input = 14x14x6. Output = 10x10x16 conv2d
SubSampling #2. Input = 10x10x16. Output = 5x5x16 avg_pool
Fully Connected #1. Input = 5x5x16. Output = 120
Fully Connected #2. Input = 120. Output = 84
Output 10

With the above concept, In 1990, LeCunn and his collaborators published a paper revisiting the use of backpropagation networks for handwritten digit recognition. Their approach involved minimal data preprocessing, and the model was meticulously tailored for this specific task, characterized by significant constraints. The input data consisted of images, each featuring a numeral, and their tests on digital postal code data from the US Postal Service demonstrated a remarkable model performance with only a 1% error rate and approximately a 9% rejection rate.

Their research efforts persisted for the subsequent four years, leading to the development of the MNIST database in 1994. LeNet-1, however, proved to be insufficient for this database, prompting the training of a new neural network, LeNet-4, to tackle the task. A year later, the collective at AT&T Bell Labs introduced LeNet-5 and conducted a comprehensive review of various methods for handwritten character recognition in a paper. They used standard handwritten digits as benchmark tasks for comparison, and the results highlighted that the latest network surpassed other models in performance.

By 1998, Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner were able to showcase practical applications of neural networks, including systems for online recognition of handwritten characters and models capable of processing millions of checks daily.

This research achieved remarkable success and piqued the interest of scholars in the field of neural networks. Although contemporary best-performing neural network architectures differ from the original LeNet, this model served as a foundational point of reference for numerous subsequent neural network designs, offering valuable inspiration to the field.

5. Extensions and Variations:

Over the years, researchers have built upon LeNet-5, introducing various modifications and extensions to enhance its performance. These include adding more layers, using different activation functions, implementing regularization techniques, and incorporating advanced optimization algorithms. These variations have extended LeNet-5’s applications to areas such as object detection and image segmentation.

Initially, LeNet did not use ImageNet because ImageNet, as we know it today, did not exist at the time of LeNet’s development.

ImageNet, with its vast collection of labeled images across numerous categories, emerged much later. It gained prominence in the mid-2000s and played a pivotal role in the advancement of deep learning and CNNs. The “ImageNet moment” in deep learning history occurred when deep and large CNN models, such as AlexNet, achieved exceptional accuracy on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This event marked a significant shift in the field of computer vision.

LeNet and ImageNet are notable in their own right, with LeNet being an early CNN model that demonstrated the potential of CNNs for character recognition and simple image tasks, while ImageNet served as a benchmark dataset for large-scale image classification and object recognition. The utilization of ImageNet in deep learning research and competition came several years after the development of LeNet.

One more concept to understand in CNN, Dropout. Dropout is a regularization technique used in CNNs like LeNet. It operates by randomly deactivating a portion of neurons during training in the fully connected layers. This simulates an ensemble effect, reduces overfitting, and encourages the network to learn more robust and generalized features. Dropout enhances the model’s ability to perform well on unseen data and plays a vital role in improving the network’s generalization capabilities.

Conclusion:

LeNet-5’s impact on the field of deep learning cannot be overstated. Its introduction of convolutional layers and weight sharing laid the foundation for modern CNNs, revolutionizing computer vision tasks. While subsequent models have surpassed LeNet-5’s performance, it remains a crucial milestone in the history of deep learning. Understanding LeNet-5 not only provides insights into its architecture but also enables us to appreciate the advancements that have followed and the potential for further innovation in the field.

Stay tuned for more on this topic next week.

My other blogs:

CNNs: The Secret Sauce to AI’s Success (Part I)

CNNs: The Secret Sauce to AI’s Success (Part II)

Mastering MNIST with ANN: Secret to hand-written digit recognization

Loss Function| The Secret Ingredient to Building High-Performance AI Models

Optimizers: The Secret Sauce of Deep Learning

Activation Functions: The Hidden Heroes of Neural Networks

The Future of Neural Networks May Lie in the Co-existence of Neurons and Perceptrons

Unleashing The Power Of GPT-4: A Quantum Leap In AI Language Models

If you enjoy reading stories like these and want to support my writing, please consider Follow and Like . I’ll cover most deep learning topics in this series.

LeNet: The Deep Learning Model That Revolutionized Image Recognition

LeNet-5 layers:

Written by Neha Purohit