How to implement early stopper in detectron2 with DefaultTrainer?

3 min readSep 9, 2022

One of the biggest challenges in training neural networks is how long to train them.

The network may underfit the training data if the epochs are too less while too many epochs may lead to overfitting. We look for just the right number of epochs, the sweet spot, where the model performs at its best.

One of the legacy methods to decide the number of iterations/epochs can be grid search or random search, etc. But these methods do not inhibit the training process. They only provide us with the best number of epochs from the available search space. However, from the figure above, we can infer that the training should have stopped around the 74th epoch.

About the tutorial

The tutorial is for detection/segmentation tasks in detectron2.
We are using COCO evaluation metrics.
To execute early stop, we will be using bbox AP50 as our accuracy metric.

The main idea

We test our model after a reasonable number of iterations. The evaluation returns a variety of metrics.

We take AP50 metric and check if the current_AP50 ≤ prev_AP50.
If the condition is satisfied, it means the performance is either decreasing or has gone flat, so we stop the trainer.
But stopping right away is too harsh.

We shall give our model a second chance to see if the performance gets on track again.
We will include a patience parameter that controls how long shall we wait before terminating the training process.

Implementation of early stopper in detectron2

There are some tutorials out there which are modifying the backend of detectron2 or writing their own training loop from the scratch.
Good news! We have a simpler way to get the job done, by using hooks and overriding a few methods.

Steps:
— Define early stopping function
— Override the hooks
— Register EvalHook

Libraries

1. Define early stopping function

— Evaluate model and get AP50 for bbox

— Save the model if it is the first evaluation

We have provided an early stopping function below. You can write your own as well but it should return either [None or True] where True will indicate a signal to terminate training.
Note: To use this early stopping function, you must have registered your test data in the DatasetCatalog as “custom_test”

2. Override the hooks

EvalHook
DefaultTrainer

The following 3 gists is all you need: