CutMix Augmentation in Python
Improving classification and object detection
The paper CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features published in ICCV 2019 proposed the CutMix augmentation strategy to improve deep learning-based classification and localization tasks.
In this augmentation technique, patches are cut and pasted among training images where interestingly the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperformed the state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on the ImageNet weakly-supervised localization task.
The paper also shows that CutMix improves the model robustness against input corruptions and its out-of-distribution detection performances.
In the following table, we can see the performance boost obtained by adopting CutMix augmentation.
The codes used in the article are collectively available at my Kaggle notebook.
Let’s start!
Loading Data
To show the result of CutMix augmentation we will use a set of images containing images of the wheat field showing wheat heads. These images are takes from Kaggle’s Global Wheat Detection competition dataset.
Generating a Batch of Data
Now, we will create a set or “batch” of 4 images. Let’s assume that this batch is a minibatch generated by a dataloader that loads mini-batches during the training of the deep neural network model. In general, the batch loaded by a dataloader also includes a batch of image labels if we are solving a classification problem. We will randomly assign a label to each image from the set of {0,1,2}. We will further one-hot encode these labels to {[1,0,0], [0,1,0], [0,0,1]}. Please follow this to learn more about dataloaders in PyTorch.
Generating Random Bounding Box
Here is a function to generate a random bounding box in an image:
Now, let’s test the function by generating a random bounding box and crop an image using that bounding box.
CutMix Image Generation
In the CutMix algorithm, for each image of batch a random region (in our case, a random bounding box region) is replaced with a patch from another training image.
- The parameter
lambda
(the variablelam
in the code) that determines the size of the bounding box is stochastically sampled from a Beta distribution. - The label of each resultant augmented image is estimated as a weighted sum of the original label and the label of the image from which the modified patch is borrowed, where weights are
lambda
and(1-lambda)
respectively.
The original images:
The augmented images:
The original labels of the images were:
[0 0 1]
[0 1 0]
[0 0 1]
[1 0 0]
After CutMix augmentation the updated labels are:
[0. 0. 1. ]
[0. 0.54814243 0.45185757]
[0.45185757 0. 0.54814243]
[0.54814243 0.45185757 0. ]
Now, you can use CutMix augmentation in training your deep neural network model.
For more details please follow the references. If you have any questions, please put them in the comment section.