How to Create a Residual Network in TensorFlow and Keras

Ali Pourramezan Fard

Published in

The Startup

6 min readOct 7, 2020

**Figure.1:** Sample Residual Network.(model is depicted using Netron)

The code with an explanation is available at GitHub.

Please clap if you like the post.

ResNet, was first introduced by Kaiming He[1]. If you are not familiar with Residual Networks and why they can more likely improve the accuracy of a network, I recommend you to take a look at the paper here.

While creating a Sequential model in Tensor flow and Keras is not too complex, creating a residual network might have some complexities. In this article, I show you how to create a residual network from scratch.

Summary:

Task type: classifying handwritten digits.
Dataset: THE MNIST DATABASE(available here).
Network Architecture: a small residual network shown in Figure 1.
Optimizer: Adam
Loss function: categorical_crossentropy

Code directory structure:

main.py
train.py
network_model.py

main.py

This file is the start point of the application. As shown below, in main.py, we create an object called trainer, from the Train class and initializing it with input_shape argument which is a [28, 28, 1]. This argument is the size of the input images. The second argument, output_shape is 10, defined as the number of the output classes that we are going to categorize.

if __name__ == '__main__':
    trainer = Train(input_shape=[28, 28, 1], output_shape=10)
    trainer.train_model()

train.py

This file contains a class called Train. below is the constructor of the Train class, containing input_shape, output_shape arguments:

def __init__(self, input_shape, output_shape):
    """
    :param input_shape: shape of input images, default is 28 * 28 *1
    :param output_shape: number of classes. for mnist_ds, default is 10

    """
  
    self.input_shape = input_shape
    self.output_shape = output_shape

Plus, we have a method called train_model(), which we use to create dataset and model and then train the model.

Loading dataset:

since the mnist dataset is now available as part of Keras datasets, we can download it using the below command:

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

now, we have :

x_train: an array contains 60,000 grayscale 28*28 pixels image. Accordingly, its shape is 6000*28*28. x_train is used as an input dataset.
y_train: an array contains 60,000 integer numbers, taken into account as the labels for the x_train images. Each item of y_train relates to its corresponding index in x_train.
x_test: an array contains 10,000 grayscale 28*28 pixels image that we use further to test the performance of our model.
y_test: like y_train, this is an array containing 10,000 integer numbers considered as x_test labels.

Now in order to improve the performance of our neural network, we create our validation set from our training set, using its first 5000 items:

x_val = x_train[-5000:]
y_val = y_train[-5000:]
x_train = x_train[:-5000]
y_train = y_train[:-5000]

Now we need to add another dimension to our image arrays, x_train, x_val, and x_test to make them shape as a real image with 3 dimensions, width, height, and the color channel which is 1 here since our images are grayscale:

x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[2], 1)
x_val = x_val.reshape(x_val.shape[0], x_val.shape[1], x_val.shape[2], 1)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[2], 1)

And then normalizing the input image by multiplying each image our 255:

x_train = x_train/255.0
x_val = x_val/255.0
x_test = x_test/255.0

Finally, we need to convert our labels to categorical labels:

y_train = keras.utils.to_categorical(y_train, self.output_shape)
y_val = keras.utils.to_categorical(y_val, self.output_shape)
y_test = keras.utils.to_categorical(y_test, self.output_shape)

Creating Model:

We define the model in another file called network_model.py. As shown below, we create an object from NetworkModel class and create the model using the sample_res_net_v0 method. We explain the model further in this tutorial.

net_model = NetworkModel()
model_1 = net_model.sample_res_net_v0(input_shape=self.input_shape, output_shape=self.output_shape)

Compiling and Training Model:

We compile the model using below :

model_1.compile(loss=losses.categorical_crossentropy,
                optimizer=adam(),
                metrics=['accuracy'])

Since our task is a multi-variant classification, we need to use categorical_crossentropy as our loss function. If you are not familiar with different types of loss function in TensorFlow, I strongly recommend you to take a look at here. Plus, we use Adam as our optimizer, and the only metric that we would like to monitor is accuracy.

Finally, we start training the model as below, for 20 epochs and then save the model and weights:

history = model_1.fit(x_train, y_train,
                      batch_size=50,
                      epochs=20,
                      validation_data=(x_val, y_val))
model_1.save()

Visualizing Accuracy and Loss Figures:

After training the model, the history objects contain some valuable information that we can use in order to visualize both accuracy and loss of the model. Then by using the matplotlib library, we generate and save the figures:

import matplotlib.pyplot as pltdef show_figures(self, history):
    plt.plot(history['accuracy'])
    plt.plot(history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'])
    plt.savefig('accuracy')    plt.plot(history['loss'])
    plt.plot(history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'])
    plt.savefig('loss')

network_model.py

We create our residual network in this class in a method called sample_res_net_v0 :

input = Input(shape=(input_shape[0], input_shape[1], input_shape[2]))

'''block_1'''
b1_cnv2d_1 = Conv2D(filters=16, kernel_size=(3, 3), strides=(2, 2), padding='same',
                 use_bias=False, name='b1_cnv2d_1', kernel_initializer='normal')(input)
b1_relu_1 = ReLU(name='b1_relu_1')(b1_cnv2d_1)
b1_bn_1 = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b1_bn_1')(b1_relu_1)  # size: 14*14

b1_cnv2d_2 = Conv2D(filters=32, kernel_size=(1, 1), strides=(2, 2), padding='same',
                    use_bias=False, name='b1_cnv2d_2', kernel_initializer='normal')(b1_bn_1)
b1_relu_2 = ReLU(name='b1_relu_2')(b1_cnv2d_2)
b1_out = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b1_out')(b1_relu_2)  # size: 14*14

'''block 2'''
b2_cnv2d_1 = Conv2D(filters=32, kernel_size=(1, 1), strides=(1, 1), padding='same',
                    use_bias=False, name='b2_cnv2d_1', kernel_initializer='normal')(b1_out)
b2_relu_1 = ReLU(name='b2_relu_1')(b2_cnv2d_1)
b2_bn_1 = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b2_bn_1')(b2_relu_1)  # size: 14*14

b2_add = add([b1_out, b2_bn_1])  #

b2_cnv2d_2 = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), padding='same',
                    use_bias=False, name='b2_cnv2d_2', kernel_initializer='normal')(b2_add)
b2_relu_2 = ReLU(name='b2_relu_2')(b2_cnv2d_2)
b2_out = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b2_bn_2')(b2_relu_2)  # size: 7*7

'''block 3'''
b3_cnv2d_1 = Conv2D(filters=64, kernel_size=(1, 1), strides=(1, 1), padding='same',
                    use_bias=False, name='b3_cnv2d_1', kernel_initializer='normal')(b2_out)
b3_relu_1 = ReLU(name='b3_relu_1')(b3_cnv2d_1)
b3_bn_1 = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b3_bn_1')(b3_relu_1)  # size: 7*7

b3_add = add([b2_out, b3_bn_1])  #

b3_cnv2d_2 = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2), padding='same',
                    use_bias=False, name='b3_cnv2d_2', kernel_initializer='normal')(b3_add)
b3_relu_2 = ReLU(name='b3_relu_2')(b3_cnv2d_2)
b3_out = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b3_out')(b3_relu_2)  # size: 3*3

'''block 4'''
b4_avg_p = GlobalAveragePooling2D()(b3_out)
output = Dense(output_shape, name='model_output', activation='softmax',
               kernel_initializer='he_uniform')(b4_avg_p)

model = Model(input, output)

model_json = model.to_json()

with open("sample_res_net_v0.json", "w") as json_file:
    json_file.write(model_json)
model.summary()
return model

We have created a model with 4 main blocks as shown in Fig.1. We have utilized to residual connection in our network. The first one used to connect to the output of the first block to the output of the second block:

b2_add = add([b1_out, b2_bn_1]

And then, we used the second residual link to the output of the second block to the output of the third block.

b3_add = add([b2_out, b3_bn_1])  #

You have the option to add or to concatenate two layers based on your designed model. Besides, it is also possible to add BatchNormalization or Activation layers with each other. Convolution layers can be linked to each other, however, you need to consider that the dimensions of all residual inputs should be the same.

Another important thing you need to pay attention to is that you are not able to create a residual network using Keras sequential class. Contrary, we need to use Keras Functional APIs. Following the documentation of the functional APIs in Keras, we first need to define the input of the network as follow:

input = Input(shape=(224, 224,3))

In the above example, we have created an Input layer with a shape 224*224*3, which is the size of the image we want to pass the network. Then we can continue defining other layers:

'''a 2d convolution'''
cnv2d = Conv2D(filters=16, kernel_size=(3, 3), strides=(2, 2), padding='same',use_bias=False, name='b1_cnv2d_1',kernel_initializer='normal')(input)'''a Relu Layer'''
b1_relu_1 = ReLU(name='b1_relu_1')(b1_cnv2d_1)'''a BatchNormalization Layer'''
b1_bn_1 = BatchNormalization(epsilon=1e-3, momentum=0.999, name='b1_bn_1')(b1_relu_1) '''a Dense Layer'''
output = Dense(output_shape, name='model_output', activation='softmax',
               kernel_initializer='he_uniform')(b4_avg_p)

After defining all the layers, we need to create our Model:

model = Model(input, output)

The model needs to have an input layer as well as at least one output layer. As you can see, defining the models using the Keras Functional APIs is so easy and at the same time powerful and flexible.

Finally, you can save your model to a JSON file, and print a summary of it using the following lines of code:

model_json = model.to_json()

with open("sample_res_net_v0.json", "w") as json_file:
    json_file.write(model_json)
model.summary()

Conclusion

In this article, I have shown how to create a residual network using Keras Functional APIs. For better understanding, I have also uploaded the complete source code on Github here.

The code with an explanation is available at GitHub.

Please clap if you like the post.

Reference

[1] He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

How to Create a Residual Network in TensorFlow and Keras

The code with an explanation is available at GitHub.

Please clap if you like the post.

Summary:

Code directory structure:

main.py

train.py

Loading dataset:

Creating Model:

Compiling and Training Model:

Visualizing Accuracy and Loss Figures:

network_model.py

The code with an explanation is available at GitHub.

Please clap if you like the post.

Written by Ali Pourramezan Fard