ImageNet Classification with Deep Convolutional Neural Networks

3 min readSep 23, 2023

Introduction

Convolutional neural networks (CNNs) are a type of artificial neural network that are particularly well-suited for image recognition tasks. CNNs learn to identify patterns in images by using multiple layers of filters, which are applied to the image in a hierarchical manner.

In their 2012 paper, Krizhevsky et al. proposed a new CNN architecture that achieved state-of-the-art results on the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). The ILSVRC is a competition that challenges participants to develop algorithms that can accurately classify images into 1000 different categories.

The purpose of Krizhevsky et al.’s study was to develop a CNN architecture that could achieve high accuracy on the ILSVRC without overfitting. Overfitting is a problem that occurs when a model learns the training data too well and is unable to generalize to new data.

Procedures

Krizhevsky et al. trained their CNN on a subset of the ImageNet training set, which consists of 1.2 million labeled images. The network has eight learned layers, including five convolutional layers and three fully connected layers.

The convolutional layers learn to identify patterns in the images by using filters. Each filter is a small matrix of weights that is applied to the image element-wise. The output of a convolutional layer is a feature map, which is a map of how well each filter matched the image.

The fully connected layers combine the feature maps from the convolutional layers to produce a prediction for the image category.

To prevent overfitting, Krizhevsky et al. used a number of techniques, including:

Dropout: Dropout randomly drops out neurons from the network during training. This helps to prevent the network from learning the training data too well and overfitting.
Data augmentation: Data augmentation artificially increases the size of the training set by creating new images from existing images. This helps to improve the network’s ability to generalize to new data.
L2 regularization: L2 regularization adds a penalty to the loss function for large weights. This helps to prevent the network from learning too complex of a model.

Results

Krizhevsky et al.’s CNN achieved a top-1 error rate of 15.3% and a top-5 error rate of 5.9% on the ILSVRC-2012 test set. This was a significant improvement over the previous state-of-the-art results, which were 25.8% and 10.9%, respectively.

Conclusion

Krizhevsky et al. concluded that their CNN architecture was able to achieve high accuracy on the ILSVRC without overfitting due to the use of dropout, data augmentation, and L2 regularization. They also found that the depth of the network was important for achieving high performance.

Personal Notes

I am impressed by the results of Krizhevsky et al.’s study. Their CNN architecture was able to achieve a significant improvement over the previous state-of-the-art results on the ILSVRC. This study showed that CNNs are capable of achieving high accuracy on large-scale image recognition tasks.

I believe that CNNs have the potential to revolutionize many industries, including healthcare, transportation, and security. CNNs can be used to develop new algorithms for diagnosing diseases, detecting objects in images, and identifying individuals.