DLOA (Part-14)-AlexNet CNN and Implementation

Published in

Learn AI With Me

8 min readMay 8, 2023

Hey readers, hope you all are doing well, safe, and sound. I hope you have already read the previous blog. The previous blog briefly discussed LeNet CNN and implemented the Python code. If you didn’t read that you can go through this link. In this blog, we’ll be discussing AlexNet CNN, working, and Implementation.

Introduction

AlexNet is a convolutional neural network (CNN) that was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012. It was designed for image classification tasks and won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 with a top-5 error rate of 15.3%, significantly outperforming previous methods.

AlexNet consists of 8 layers:

5 convolutional layers and 3 fully connected layers.

It also uses a number of techniques such as data augmentation, dropout, and ReLU activation functions that were relatively new at the time but have since become standard practices in CNN design.

Working of AlexNet CNN

The working of AlexNet can be divided into two phases: Training and Inference.

Training:

During the training phase, AlexNet takes a batch of images as input and applies a series of convolutional and pooling operations to extract features from the images. The features are then fed into a series of fully connected layers that classify the image into one of 1000 categories.

Before training, the weights of the network are initialized randomly. During training, the weights are adjusted using backpropagation and stochastic gradient descent (SGD) to minimize the difference between the predicted and actual labels. The training process continues until the difference between the predicted and actual labels is small enough or the maximum number of iterations is reached.

In addition to standard training techniques such as SGD and backpropagation, AlexNet uses several novel techniques to improve performance. One such technique is data augmentation, which involves generating new training examples by applying random transformations such as rotation, scaling, and flipping to the original images. Another technique is a dropout, which randomly drops out neurons during training to prevent overfitting.

Inference:

During the inference phase, AlexNet takes a single image as input and applies the same series of convolutional and pooling operations as during training to extract features from the image. The features are then fed into the fully connected layers, which generate a probability distribution over the 1000 categories.

The category with the highest probability is taken as the predicted label for the image. If the top-5 predictions include the actual label, the prediction is considered correct. Otherwise, it is considered incorrect.

AlexNet achieves high accuracy on the ImageNet dataset by using a deep architecture with multiple layers of convolutional and pooling operations to extract high-level features from the images. It also uses techniques such as data augmentation, dropout, and ReLU activation functions to prevent overfitting and improve performance.

Architecture of AlexNet CNN

The architecture of AlexNet can be summarized as follows:

Input layer: The input layer takes as input a 227x227x3 RGB image.
Convolutional layer 1: The first convolutional layer has 96 filters of size 11x11 with a stride of 4 pixels. This is followed by a ReLU activation function and local response normalization (LRN) to normalize the outputs of the neurons.
Max pooling layer 1: The first max pooling layer follows the first convolutional layer and has a pool size of 3x3 with a stride of 2 pixels.
Convolutional layer 2: The second convolutional layer has 256 filters of size 5x5 with a stride of 1 pixel. This is followed by a ReLU activation function and LRN.
Max pooling layer 2: The second max pooling layer follows the second convolutional layer and has a pool size of 3x3 with a stride of 2 pixels.
Convolutional layer 3: The third convolutional layer has 384 filters of size 3x3 with a stride of 1 pixel. This is followed by a ReLU activation function.
Convolutional layer 4: The fourth convolutional layer has 384 filters of size 3x3 with a stride of 1 pixel. This is followed by a ReLU activation function.
Convolutional layer 5: The fifth convolutional layer has 256 filters of size 3x3 with a stride of 1 pixel. This is followed by a ReLU activation function.
Max pooling layer 3: The third max pooling layer follows the fifth convolutional layer and has a pool size of 3x3 with a stride of 2 pixels.
Fully connected layer 1: The first fully connected layer has 4096 neurons and is followed by a ReLU activation function and dropout.
Fully connected layer 2: The second fully connected layer has 4096 neurons and is followed by a ReLU activation function and dropout.
Fully connected layer 3: The third fully connected layer has 1000 neurons (corresponding to the 1000 ImageNet classes) and is followed by a softmax activation function.

Implementation

In this implementation, the input shape is defined as (227, 227, 3), which corresponds to a 227x227 RGB image. The model consists of 8 layers, including 5 convolutional layers and 3 fully connected layers. The convolutional layers use ReLU activation functions, and the fully connected layers use softmax activation functions. Dropout is used in the fully connected layers to prevent overfitting.

To train the model, you would need to load a dataset of images and their corresponding labels and call the fit() function of the model object. The fit() function takes as input the training data, validation data (optional), the number of epochs, and batch size, among other arguments.

Here is a step-by-step explanation of the implementation:

Import the necessary libraries:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Define the input shape of the network:

# Define the input shape of the network
input_shape = (227, 227, 3)

Here, input_shape is a tuple that specifies the shape of the input images to the network. In this case, the images are 227x227 RGB images, so the shape is (227, 227, 3).

Define the AlexNet model:

# Define the AlexNet model
model = keras.Sequential(
    [
        # Layer 1
        layers.Conv2D(96, kernel_size=(11, 11), strides=(4, 4), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),
        layers.BatchNormalization(),

        # Layer 2
        layers.Conv2D(256, kernel_size=(5, 5), padding='same', activation='relu'),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),
        layers.BatchNormalization(),

        # Layer 3
        layers.Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu'),

        # Layer 4
        layers.Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu'),

        # Layer 5
        layers.Conv2D(256, kernel_size=(3, 3), padding='same', activation='relu'),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),

        # Layer 6
        layers.Flatten(),
        layers.Dense(4096, activation='relu'),
        layers.Dropout(0.5),

        # Layer 7
        layers.Dense(4096, activation='relu'),
        layers.Dropout(0.5),

        # Layer 8
        layers.Dense(1000, activation='softmax')
    ]
)

The model consists of 8 layers, including 5 convolutional layers and 3 fully connected layers. Each convolutional layer is followed by a max pooling layer and a batch normalization layer. Dropout is used in the fully connected layers to prevent overfitting. The final layer is a fully connected layer with 1000 neurons and a softmax activation function, which is used to classify the input image into one of 1000 categories.

Compile the model:

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Here, we specify the optimizer to use (Adam), the loss function to use (categorical cross-entropy), and the metric to use for evaluation (accuracy).

Print the summary of the model:

# Print the summary of the model
model.summary()

Load the dataset:

To train the model, you would need to load a dataset of images and their corresponding labels. This can be done using the ImageDataGenerator class in Keras.

Train the model:

After defining the architecture of the model, we can train it using the ImageNet dataset or any other dataset of our choice.
To do so, we first need to compile the model by specifying the loss function, optimizer, and evaluation metric.
Then we can use the fit() method of the model to train it on the training set and validate it on the validation set.
Here’s an example code snippet that demonstrates how to train the AlexNet model on a sample dataset:

# Train the model
model.fit(train_data, train_labels, batch_size=128, epochs=100, validation_data=(val_data, val_labels))

In the above code, we have compiled the model using the categorical_crossentropy loss function, the sgd optimizer, and the accuracy metric for evaluation. We then use the fit() method to train the model on the train_data and train_labels for 100 epochs with a batch size of 128. We also validate the model on the val_data and val_labels during training.

Evaluate the model:

Once the model is trained, we can evaluate its performance on the test set using the evaluate() method.
This method returns the loss value and evaluation metric(s) on the test set.
Here’s an example code snippet that demonstrates how to evaluate the AlexNet model on a sample test set:

# Evaluate the model on test set
test_loss, test_acc = model.evaluate(test_data, test_labels)

# Print the test accuracy
print("Test accuracy:", test_acc)

In the above code, we use the evaluate() method to calculate the loss value and accuracy of the AlexNet model on the test_data and test_labels. We then print the test accuracy of the model.

This is how we can implement the AlexNet architecture in Python and train it on a dataset of our choice.

Full Code:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Define the input shape of the network
input_shape = (227, 227, 3)

# Define the AlexNet model
model = keras.Sequential(
    [
        # Layer 1
        layers.Conv2D(96, kernel_size=(11, 11), strides=(4, 4), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),
        layers.BatchNormalization(),

        # Layer 2
        layers.Conv2D(256, kernel_size=(5, 5), padding='same', activation='relu'),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),
        layers.BatchNormalization(),

        # Layer 3
        layers.Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu'),

        # Layer 4
        layers.Conv2D(384, kernel_size=(3, 3), padding='same', activation='relu'),

        # Layer 5
        layers.Conv2D(256, kernel_size=(3, 3), padding='same', activation='relu'),
        layers.MaxPooling2D(pool_size=(3, 3), strides=(2, 2)),

        # Layer 6
        layers.Flatten(),
        layers.Dense(4096, activation='relu'),
        layers.Dropout(0.5),

        # Layer 7
        layers.Dense(4096, activation='relu'),
        layers.Dropout(0.5),

        # Layer 8
        layers.Dense(1000, activation='softmax')
    ]
)

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Print the summary of the model
model.summary()

# Train the model
model.fit(train_data, train_labels, batch_size=128, epochs=100, validation_data=(val_data, val_labels))

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_data, test_labels)

# Print the test accuracy
print("Test accuracy:", test_acc)

Conclusion

AlexNet is a work of supervised learning and got excellent results.

It was also important for selecting methods like dropout and data augmentation that helped the network’s performance.

AlexNet’s revolutionary implementation on ConvNets continues nowadays, such as ReLU and dropout.

It is not easy to have low classification errors without overfitting.

That’s it for now….I hope you liked my blog and got to know about AlexNet CNN, it’s working, and the example I had taken while implementing the code.

In the next blog, I will be discussing VGGNet CNN and its Implementation.

If you are really liking my blogs please share them with others as well.

Till then Stay tuned for the next blog…

***Next Blog***