What is Overfitting and Underfitting , and how to deal with it step by step?

5 min readSep 20, 2023

In machine learning, it is common to face a situation when the accuracy of models on the validation data would peak after training for a number of epochs and then stagnate or start decreasing.

Overfitting

It is critical to understand how to deal with overfitting. Although high accuracy on the training set is often attainable, what you actually want is to construct models that generalise effectively to a testing set (or unseen data). When the model trains for too long(more iteration than required) on sample data or when the model is too complex, it can start to learn the “noise,” or irrelevant information, within the dataset, and the model memorize the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to generalize well to new data.

Underfitting

Underfitting is complete opposite of overfitting. Underfitting happens when the train data still has potential for improvement and the model has not trained for enough time or the input variables are not significant enough(need more data) to determine a meaningful relationship between the input and output variables.This indicates that the network did not recognise the relevant patterns in the training data.

What is common in both cases:

The model is unable to identify the prevailing trend in the training dataset. As a result, underfitting generalises poorly to unseen data. In contrast to overfitting, underfitted models have high bias and low variance in their predictions. When fitting a model, the goal is to locate the “sweet spot” between underfitting and overfitting so that a dominant trend may be established and applied generally to new datasets.

Methods to tradeoff overfitting/underfitting conditions:

Reduce complexity the structure of ANN models: In place of highly complex neural network, use simple architecture helps to get rid off overfitting. Complex structure can be perplexed the network learning capabilities.
Early stopping: This technique aims to stop training before the model begins to learn about its own noise. This strategy runs the danger of prematurely stopping the training process, which would cause underfitting, the opposite issue. The ultimate objective is to find the “sweet spot” between underfitting and overfitting.
Train with more data: Expanding the training set to include more data can increase the accuracy of the model by providing more opportunities to parse out the dominant relationship among the input and output variables. That said, this is a more effective method when clean, relevant data is injected into the model. Otherwise, you could just continue to add more complexity to the model, causing it to overfit.
Data augmentation: While it is better to inject clean, relevant data into your training data, sometimes noisy data is added to make a model more stable. However, this method should be done sparingly.
Regularization: Regularization optimizes a model by penalizing complex models, therefore minimizing loss and complexity. Thus this forces a neural network to be simpler. Most popular regularization are L1 and L2 regularization.L1 Regularization, also called a lasso regression, adds the “absolute value of magnitude” of the coefficient as a penalty term to the loss function. L2 Regularization, also called a ridge regression, adds the “squared magnitude” of the coefficient as the penalty term to the loss function.
Dropout: Dropout is a powerful technique used in machine learning to prevent overfitting and overall improve model performance. It does this by randomly “dropping” neurons from the model in the input and hidden layers. It is far better than regularisation methods and can also be combined with max-norm normalization which provides a significant boost over just using dropout.

Let’s see an example:

#Load Libraries

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import pandas as pd

load images for training and split it as a train and validation set

image_size = (180, 180)
batch_size = 128

train_ds, val_ds = tf.keras.utils.image_dataset_from_directory(
    "/kaggle/input/dataset/training_set",
    validation_split=0.2,
    subset="both",
    seed=1337,
    image_size=image_size,
    batch_size=batch_size,
)

Create model


def make_model(input_shape, num_classes):
    inputs = keras.Input(shape=input_shape)

    # Entry block
    x = layers.Rescaling(1.0 / 255)(inputs)
    x = layers.Conv2D(128, 3, strides=2, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside residual

    for size in [256, 512, 728]:
        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(size, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # Project residual
        residual = layers.Conv2D(size, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    x = layers.SeparableConv2D(1024, 3, padding="same")(x)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    x = layers.GlobalAveragePooling2D()(x)
    if num_classes == 2:
        activation = "sigmoid"
        units = 1
    else:
        activation = "softmax"
        units = num_classes

    x = layers.Dropout(0.5)(x) #only use while overfitting 
    outputs = layers.Dense(units, activation=activation)(x)
    return keras.Model(inputs, outputs)


model = make_model(input_shape=image_size + (3,), num_classes=2)
keras.utils.plot_model(model, show_shapes=True)

Train model

epochs = 10

callbacks = [
    keras.callbacks.ModelCheckpoint("/kaggle/working/save_at_{epoch}.keras"),
]
model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss="binary_crossentropy",
    metrics=["accuracy"],
)
model.fit(
    train_ds,
    epochs=epochs,
    callbacks=callbacks,
    validation_data=val_ds,

The output indicates that the model is overfitted. So we will use data augmentation to reduce overfitting problem. Along with this we also use dropout before final dense layer.

Using image data augmentation

When you don’t have a large image dataset, it’s a good practice to artificially introduce sample diversity by applying random yet realistic transformations to the training images, such as random horizontal flipping or small random rotations. This helps expose the model to different aspects of the training data while slowing down overfitting

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
    ]
)

Sync the augmentation with train data

# Apply data_augmentation to the training images.
train_ds = train_ds.map(
    lambda img, label: (data_augmentation(img), label),
    num_parallel_calls=tf.data.AUTOTUNE,
)
# Prefetching samples in GPU memory helps maximize GPU utilization.
train_ds = train_ds.prefetch(tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(tf.data.AUTOTUNE)

Run the model again and plot the output, it suppose to be like it.

Before and after augmentation and dropout

There are lots of scope to improve this model. You can use regularization in hidden layers. Number of epoch and early stopping can be used to handle underfitting conditions. As per your dataset, change hyperparameter and other variable inputs to get best fitting line.

References

Regularization in Machine Learning | by Prashant GuptaTowards Data Sciencehttps://towardsdatascience.com › regularization-in-m…

Section 4 (Week 4)

Xavier Initialization and Regularization

cs230.stanford.edu

https://arxiv.org/pdf/1812.11118.pdf