HyperParameter Tuning: Fixing High Bias(Underfitting) in Neural Networks

Sanskar Hasija
6 min readJul 18, 2021

Quick methods to decrease high bias (underfitting) problems in neural networks.

Hyper-Parameter Tuning

Introduction

In this blog, we will go through some methods and techniques to fix the problem of high bias ( underfitting ) in neural networks. High Bias is a common problem that is faced during the training of a neural network. The problem of high bias arises when both the training and test set accuracies are not adequately high. The problem usually symbolizes that the trained model has not learnt the input-output mapping and is also unable to generalize properly on the cross-validation or the test set.

We will check the effect of various factors on training accuracy step by step in this blog.

Imports and Preprocessing

We will start by importing the TensorFlow, NumPy and Matplotlib libraries and initializing some hyper-parameters such as the number of epochs, learning rate and optimizer

import tensorflow as tf 
import numpy as np
import matplotlib.pyplot as plt


tf.random.set_seed(1)
EPOCHS = 20
LR = 0.001
OPT = tf.keras.optimizers.Adam(LR)
plt.style.use('fivethirtyeight')
plt.rcParams["figure.figsize"] = (8,5)

We will be using the famous Mnist dataset for the demonstration. The Mnist dataset contains 60,000 images with an 80:20 train-test split. All the images are grayscale and are of shape (28,28).

(x_train , y_train) , (x_test , y_test ) = tf.keras.datasets.mnist.load_data()x_train = x_train /255 
x_test = x_test/255

We can access this dataset directly through the TensorFlow library. The data is already separated into training and test subsets. In the next step, we will normalize our images.

MODEL DESIGN

We will start by building a simple neural network with no hidden layers, just an input and an output layer.

model = tf.keras.Sequential(
[tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(10,activation = "softmax")])
model.compile(optimizer=OPT,
loss = "sparse_categorical_crossentropy",
metrics = ["accuracy"])

We will compile this model using sparse categorical cross-entropy as loss and set the metrics to accuracy.

EFFECT OF INCREASING DATA

We will train the above-defined model two times but with different data distributions. To demonstrate the effect of data on high bias we will define a new subset of the training data with only 60% of the total training data.

(x_train_partial , y_train_partial) =   (x_train[:30000], y_train[:30000])

The new ( x_train_partial , y_train_partial ) dataset has 30,000 images as compared to the 50,000 images in the original dataset. After training both these datasets, we can now plot a training accuracy vs the number of epochs plot to check the effects of increasing data.

Effect of Data

It is clearly visible from the above figure that increasing the data does not help in fixing the problem of high bias.

EFFECT OF INCREASING HIDDEN LAYERS

Now we will increase the number of hidden layers in our network and verify its effect on the training accuracy of our model. We will train four different models with several hidden layers set to 1,2,3 and 5 layers respectively. The architecture of all 4 models is as follows:

one_layer_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(10 , activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])
two_layers_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(10 , activation = "relu"),
tf.keras.layers.Dense(20 , activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])
three_layers_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(20 , activation = "relu"),
tf.keras.layers.Dense(40 , activation = "relu"),
tf.keras.layers.Dense(20 , activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])
five_layers_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]), tf.keras.layers.Dense(10 , activation = "relu"), tf.keras.layers.Dense(20 , activation = "relu"), tf.keras.layers.Dense(40 , activation = "relu"), tf.keras.layers.Dense(20 , activation = "relu"), tf.keras.layers.Dense(10,activation = "softmax")])

After training the full datasets for 20 epochs on all the above our models, we get the following figure for accuracies comparison :

Effect of hidden layers

It is clearly visible that increasing the number of hidden layers directly increases the accuracy as we go further during the training process. For the mnist dataset, a choice of 3 hidden layers seems to generate the best results.

We will now use this 3 hidden layer neural network as our reference and check the effect of increasing nodes in different layers in this architecture.

EFFECT OF NUMBER OF UNITS(NODES) IN HIDDEN LAYERS

We will now increase the number of nodes in different layers of the previously trained 3 layer network. A common practice is to set the number of units in different layers in descending order. We will train two different models for this demonstration. The first model will have a small number of units whereas the second model will have a larger number of units.

small_units_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(80,activation = "relu"),
tf.keras.layers.Dense(40,activation = "relu"),
tf.keras.layers.Dense(20,activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])
large_units_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]), tf.keras.layers.Dense(512,activation = "relu"), tf.keras.layers.Dense(128,activation = "relu"), tf.keras.layers.Dense(64,activation = "relu"), tf.keras.layers.Dense(10,activation = "softmax")])

We have set the units in the second model as powers of 2. This is considered the best default choice for setting up the number of units in our neural networks.

After training the full datasets for 20 epochs on the above two our models, we get the following figure for accuracies comparison :

Effect of units

The number of units has clearly a large impact on training accuracy. As we increase the number of units in each layer the accuracy also increases. In the above example, the accuracy increased from 93% to more than 99% with increasing the number of layers as well as increasing the number of units in each layer.

EFFECT OF BATCH NORMALIZATION

Next, we will check the effect of adding batch normalization layers on fixing high bias. We will use the previous best model as a reference for verifying the effect of batch normalization.

bn_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(512,activation = "relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(128,activation = "relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(64,activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])

We have added a pair of BatchNormalization layers between the hidden layers. We will now train this model and compare its accuracy with our previous best model.

Effect of Batch Normalization

It is clearly visible that adding batch normalization certainly does not help in increasing the training accuracy thus decreasing the high bias. Batch Normalization has an effect on reducing high variance and solving the problem of overfitting.

EFFECT OF DROPOUTS

Lastly, We will check the effect of dropout layers in fixing the problem of high bias. We will add two Dropout layers to our previous best model.

dropout_model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape = x_train.shape[1:]),
tf.keras.layers.Dense(512,activation = "relu"),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Dense(128,activation = "relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64,activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax")])

We have added two Dropout layers between the hidden layers with dropout probabilities of 0.3 and 0.2 respectively. We will now train this model and compare its accuracy with our previous best model.

It is clearly visible that adding dropout layers in between our hidden layers does not help in increasing the training accuracy.

CONCLUSION

After training the same data on multiple models with different hyperparameters, we can conclude that the following changes can help us in fixing high bias:

  • Increasing the number of hidden layers.
  • Increasing the number of hidden units.
  • Training for a higher number of epochs.
  • Trying more neural networks.

Also, the following changes have not much impact on high bias :

  • Increasing the amount of training data.
  • Adding Batch Normalization
  • Adding Dropouts

Although the above updates in a neural network do not have a huge impact on fixing the problem of underfitting, but they certainly help in reducing high variance (or overfitting).

I hope you all enjoyed this quick small blog!!! Next week I will discuss the various hyperparameters tuning methods for fixing the problem of high variance.

The code for all the models and graphs in this blog can be accessed here — https://github.com/sanskar-hasija/Hyperparameter-Tuning

--

--