Training Deep Convolutional Neural Networks (DCNN) using VGGNet architecture

Sridevi Baskaran
3 min readSep 9, 2019

--

Introduction

The primary contribution of VGGNet (https://arxiv.org/pdf/1409.1556.pdf) architecture was to come up with a machine learning model with very small (3 x 3) convolutional filters that can be trained to greater depths (16–19 weight layers) and thereby obtain a classification model with very high accuracy. VGGNet architecture is built on two key components:

(1). All Convolutional layers in VGGNet make use of very small convolutional filters of size 3 x 3

(2).VGGNet architecture stacks multiple CONV => RELU layer combination sets, thereby increasing the network depth before applying the MaxPool operation

In this article, I have come up with a custom VGGNet implementation for the classification of CIFAR10 dataset.

Layers of custom VGGNet Implementation

Where does BN really fit?

In most of the NN implementations, BN goes before the activation layer. In my custom VGGNet example, BN goes after activation as I specifically want to ignore the negative valued features since activation function like ReLU will kill any activations less than zero. Placing BN after ReLU normalizes only the positive valued features.

Pythonic implementation

Step 1: Import necessary packages

from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.core import Dense, Flatten, Dropout, Activation
from keras import backend as K
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from keras.optimizers import SGD
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np

Step 2: Custom VGGNet architecture

class CustomVGGNet:
@staticmethod
def build(width, height, depth, classes):
model = Sequential()
inputShape = (height,width,depth)
chanDim = -1

model.add(Conv2D(32,(3,3), padding="same", input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(32,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64,(3,3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.25))

model.add(Dense(classes))
model.add(Activation("softmax"))

return model

Step 3: Load CIFAR-10 dataset and scale pixel intensities to [0,1]

((trainX,trainY), (testX,testY)) = cifar10.load_data() //data load// scaling pixel intestities
trainX = trainX.astype("float")/255.0
testX = testX.astype("float")/255.0

Step 4: Convert output labels from integers to vectors and initialize the label names

le = LabelBinarizer()
trainY = le.fit_transform(trainY)
testY = le.transform(testY)
labels = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]

Step 5: Compile and train the CustomVGGNet model

opt = SGD(lr=0.01,momentum=0.9,nesterov=True,decay=0.001/5)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss=”categorical_crossentropy”, optimizer=opt,
metrics=[“accuracy”])
// model training
H = model.fit(trainX, trainY, validation_data=(testX,testY), batch_size=128, epochs=5)

Step 6: Model prediction

predictions = model.predict(testX, batch_size=128)
print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labels))

Note: Below is the classification report for 5 epochs. Accuracy is close to ~88% for 50 epochs.

Classification report (5 epochs):

Step 7: Accuracy and loss plot

plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0,3), H.history["loss"], label="train_loss")
plt.plot(np.arange(0,3), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0,3), H.history["acc"], label="train_acc")
plt.plot(np.arange(0,3), H.history["val_acc"], label="val_acc")
plt.legend()
plt.show()

Is Batch Normalization (BN) really helpful?

CustomVGGNet architecture is trained faster without BN layer and the accuracy is less (~80%) compared to 88% with BN. BN makes the network more stable and prevents the model from overfitting. Without BN, network tends to overfit the data and validation accuracy becomes saturated after 23 epochs.

Conclusion / Key takeaways

In this article, I have implemented a custom VGG architecture consisting of two sets of (CONV => RELU) * 3 => MAXPOOL => FC => RELU => FC => SOFTMAX layers. Making use of BN layer after the activation layer has lead to faster convergence of the model with relatively higher accuracy when implementing it without BN.

--

--