Training Deep Convolutional Neural Networks (DCNN) using VGGNet architecture
Introduction
The primary contribution of VGGNet (https://arxiv.org/pdf/1409.1556.pdf) architecture was to come up with a machine learning model with very small (3 x 3) convolutional filters that can be trained to greater depths (16–19 weight layers) and thereby obtain a classification model with very high accuracy. VGGNet architecture is built on two key components:
(1). All Convolutional layers in VGGNet make use of very small convolutional filters of size 3 x 3
(2).VGGNet architecture stacks multiple CONV => RELU layer combination sets, thereby increasing the network depth before applying the MaxPool operation
In this article, I have come up with a custom VGGNet implementation for the classification of CIFAR10 dataset.
Layers of custom VGGNet Implementation
Where does BN really fit?
In most of the NN implementations, BN goes before the activation layer. In my custom VGGNet example, BN goes after activation as I specifically want to ignore the negative valued features since activation function like ReLU will kill any activations less than zero. Placing BN after ReLU normalizes only the positive valued features.
Pythonic implementation
Step 1: Import necessary packages
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.core import Dense, Flatten, Dropout, Activation
from keras import backend as K
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np
Step 2: Custom VGGNet architecture
class CustomVGGNet:
@staticmethod
def build(width, height, depth, classes):
model = Sequential()
inputShape = (height,width,depth)
chanDim = -1
model.add(Conv2D(32,(3,3), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.25))
model.add(Dense(classes))
model.add(Activation("softmax"))
return model
Step 3: Load CIFAR-10 dataset and scale pixel intensities to [0,1]
((trainX,trainY), (testX,testY)) = cifar10.load_data() //data load// scaling pixel intestities
trainX = trainX.astype("float")/255.0
testX = testX.astype("float")/255.0
Step 4: Convert output labels from integers to vectors and initialize the label names
le = LabelBinarizer()
trainY = le.fit_transform(trainY)
testY = le.transform(testY)labels = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
Step 5: Compile and train the CustomVGGNet model
opt = SGD(lr=0.01,momentum=0.9,nesterov=True,decay=0.001/5)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss=”categorical_crossentropy”, optimizer=opt,
metrics=[“accuracy”])// model training
H = model.fit(trainX, trainY, validation_data=(testX,testY), batch_size=128, epochs=5)
Step 6: Model prediction
predictions = model.predict(testX, batch_size=128)
print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labels))
Note: Below is the classification report for 5 epochs. Accuracy is close to ~88% for 50 epochs.
Classification report (5 epochs):
Step 7: Accuracy and loss plot
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0,3), H.history["loss"], label="train_loss")
plt.plot(np.arange(0,3), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0,3), H.history["acc"], label="train_acc")
plt.plot(np.arange(0,3), H.history["val_acc"], label="val_acc")
plt.legend()
plt.show()
Is Batch Normalization (BN) really helpful?
CustomVGGNet architecture is trained faster without BN layer and the accuracy is less (~80%) compared to 88% with BN. BN makes the network more stable and prevents the model from overfitting. Without BN, network tends to overfit the data and validation accuracy becomes saturated after 23 epochs.
Conclusion / Key takeaways
In this article, I have implemented a custom VGG architecture consisting of two sets of (CONV => RELU) * 3 => MAXPOOL => FC => RELU => FC => SOFTMAX layers. Making use of BN layer after the activation layer has lead to faster convergence of the model with relatively higher accuracy when implementing it without BN.