Importance of “Callbacks” when training a Deep Learning Model.

Yash Wasalwar
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨
4 min readNov 17, 2022

--

Whenever this term appears, like “callback”, in literary terms it means a return call, as simple as it gets. So to remember the term “callbacks” in reference to deep learning, I map it with the literary meaning. Bewildered?

Now let’s understand the meaning of callback in deep learning.

“Callbacks” are tailored utilities or functions carried out at specific training procedure phases.

Disclaimer : This blog is tend towards making you understand the concept of callbacks. There will be some coding snippets which you will find useful. If not able to understand, do not worry, learn on the go, depending on your preference i.e either in tensorflow or pytorch. I am using tensorflow.

Every time I think of a callback, I remind myself about getting a return call at each and every step of training the deep learning model or simply putting it, at each epoch. These are the kind of helper functions that supports our model to perform better when training and never get into a messy situation, which in the deep learning dictionary is called “OVERFITTING”.

They also help to debug your code, record logs, save checkpoints of each epoch, and so on.

There are various kinds of callbacks available such as

  • Early Stopping
  • ReduceLROnPlateau
  • Model Checkpoint

and many more…

Specifically, I am discussing those which I learned when training a deep learning model. There are many more, of which I am currently learning in my deep learning journey.

Let’s get started…

You can find several callbacks in TensorFlow under the “tf.keras.callbacks” module.

1. Early Stopping

Photo by Nick Fewings on Unsplash

This callback is employed frequently. This enables us to keep an eye on our metrics and halt model training when it reaches a plateau. You can use this callback, for instance, to decide to terminate training if accuracy does not increase by 0.07. To some extent, this helps prevent the overfitting of a model.

tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
min_delta=0,
patience=0,
verbose=0,
mode='auto',
baseline=None,
restore_best_weights=False)

monitor: the names of the metrics we want to monitor. For eg: ‘val_accuracy’, ‘val_loss’, etc.
min_delta: the minimum amount of improvement we expect in every epoch.
patience: the number of epochs to wait before stopping the training of the model.
verbose: whether or not to print logs.
mode: Specifies whether the monitored metrics should increase, decrease, or be inferred from the name; available values are ‘min’, ‘max’, or ‘auto’.
baseline: values for the monitored metrics.
restore_best_weights: if set to True, the model will get the weights of the epoch which has the best value for the monitored metrics; otherwise, it will get the weights of the last epoch.

2. ReduceLROnPlateau

Photo by Photoholgic on Unsplash

When the designated measure stops improving for a longer period of time than the patience number permits, ReduceLROnPlateau is a scheduling strategy that lowers the learning rate. As a result, the learning rate is maintained at its current level for as long as it improves the metric amount, but it is decreased when the results begin to plateau.
When you are unsure of how your model will respond to your data, you should utilize the ReduceLROnPlateau scheduler.

tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', 
factor=0.1,
patience=10,
verbose=0,
mode='auto',
min_delta=0.0001,
cooldown=0,
min_lr=0,
**kwargs)

The majority of the parameters are similar to that of early stopping callbacks such as monitor, verbose, patience, min_delta, and mode.

Let’s understand the different parameters :

factor: the factor by which the learning rate should decrease which follows the equation “new learning rate = factor * old learning rate”.

min_lr: the minimum bound learning rate can go. Below this is not allowed.

3. Model Checkpoint

Source: Loffler

We can periodically save the model during training thanks to this callback. When training deep learning models, which take a while to train, this is extremely helpful. Based on the metrics, this callback periodically saves model checkpoints while monitoring the training.

tf.keras.callbacks.ModelCheckpoint(filepath, 
monitor='val_loss',
verbose=0,
save_best_only=False,
save_weights_only=False,
mode='auto',
save_freq='epoch')

Let’s get to the parameters of this function.

filepath: path for saving the model.
monitor: name of the metrics for monitoring.
save_best_only: if True, the best model will not be overridden.
mode: determines whether the monitored metrics should be rising, falling, or implied by the name; potential options include “min,” “max,” and “auto.”
save_weights_only: if True, only the weights of the models will be saved. Otherwise, the full model will be saved.
save_freq: if ‘epoch’, the model will be saved after every epoch. If an integer value is passed, the model will be saved after the integer number of batches (not to be confused with epochs).

So these were some of the callback functions which I use in training deep learning models. There are many other callback functions that you can explore (I am exploring too! ✌️) and use, depending on your use case, and indeed make your model perform better, that’s the ultimate goal.

I hope, you must have experienced a friendly interaction with callbacks.

People who are total beginners, need not worry, take this as a motivation, as there are many things for me to learn. If I can do it, you can too!!

Happy deep learning! 🚀

--

--

Yash Wasalwar
𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

Ex-Research Intern @DRDO · Always learning · Loves to talk about Data Science and Life Experiences