A Quick Augmentation for a Quick Learning!

ARBEON Co., Ltd.
Arbeon
Published in
5 min readJan 10, 2023

Hello! My name is Theo, and I’m in charge of establishing the vision deep learning system in Arbeon’s AI Team.

Today, I’ll be talking about “data augmentation.” Large-scale data learning is prone to be slowed down due to a bottleneck and other issues. To resolve these issues, I’d like to share some methods I’ve tried!

I hope this small article could be helpful for those who are in charge of AI learning…

And let us jump right in! 😉

A Quick Augmentation for a
Quick Learning!
Table of Contents

  1. The Necessity of Vision Deep Learning Data Augmentation
  2. torchvision.transforms vs. Kornia
  3. Learning Through ResNet and Kornia
  4. Conclusion

1. The Necessity of Vision Deep Learning Data Augmentation

Recently, as our team proceeded with large-scale data learning, many issues emerged. We found out that one of the causes of these problems was the bottleneck effect during the augmentation process. In order to resolve this, we migrated from torchvision to Albumentations to accelerate the speed. However, we wanted the AI to learn at a faster speed. That’s when we found API Kornia; and today, we’d like to introduce it to you. For Kornia, like how it is used to enhance the speed of image processing, CUDA was used for data augmentation. Through this, we were able to greatly improve the bottleneck issue.

In this article, we’ll be providing a brief explanation on why data augmentation is necessary and a simple hands-on code to practice. We hope that this will aid in providing the readers with a diverse training environment as well as in planning an efficient training strategy.

1–1 Why do we need data augmentation?

The parameters of transformers (e.g., BERT), which are commonly used these days, are at a scale of over 100B. More data is required in order to optimize the hyperparameters of a large scale. The transformer models (e.g., Vit), which are trending among Vision models, require huge sets of data; and if the amount of data trained is lower than a certain point, it is difficult to experience a big difference in performance compared to the ResNet model.

Moreover, a lack of diversity in the trained data of the deep learning model can result in overfitting, and the generalization ability of the model can no longer be guaranteed. These problems can be solved by collecting more data to feed. However, since training with more data increases the model’s inference performance, data augmentation application could bring forth even better results.

Many research cases on data augmentation are in progress these days, with Mixup and Cutout being some of the notable works. As you can see through these researches cases, not only does it simply prevent overfitting via the increase of data, but it also intensifies the training level difficulty by data transformation and raises the model’s inference performance.

1–2 The results of data augmentation for computer vision

The results of data augmentation from PyTorch can be seen on pytorch doc, where you can also find many examples.

As you can see in the example image above, the image size, color, and placement can be changed, securing diversity in data.

1–3 The results of data augmentation for computer vision

No matter how flawless the data augmentation was, the end results may not come out as expected if you just carefreely use the same data that everyone is using. For instance, you may want to proceed with face detection, but if the placement has been modified, which leads to an alteration in the facial structure, there’s a high chance of causing a problem in the inference.

It wouldn’t be an exaggeration to say that deep learning begins and ends with data. Satisfactory results and performance improvements can be achieved by using data augmentation appropriately upon thorough sufficient consideration and review of the data.

2. torchvision.transforms VS Kornia

2–1 Differences between torchvision.transforms and Kornia

Kornia is also an API for Pytorch and is created for data augmentation and computer vision processing. Bottlenecks can happen when LargeScale training is performed by augmenting data using torchvision.transforms. To resolve this, we’ll be using API Kornia, which uses GPU to compute images sequentially.

As shown in the table above, we can see a powerful performance in congruence with the batch size.

It provides many more features than the functions provided in torchvision.transforms and increases the speed, allowing for effective training and inference. In line with this, our goal for this article will be to train a toy dataset using Kornia.

3. Training Through ResNet and Kornia

3–1 APIs installed beforehand

The training has proceeded using python 3.8.x in Windows, with active CUDA 11.6, and ran on Jupyter Notebook.

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
pip install kornia
pip install timm
pip install numpy
pip install tqdm

3–1 training code

import torch
import torchvision
from torch.utils.data import DataLoader
import torch.nn as nn
from torchvision.models import resnet18
from torchvision.datasets import CIFAR10
from torch.optim import Adam
from torch.nn.functional import cross_entropyfrom timm.utils import AverageMeter
from kornia import image_to_tensor
from kornia.augmentation import RandomAffine, RandomHorizontalFlip, RandomVerticalFlipimport numpy as np
from tqdm import tqdm

API import to be used in the training

model = resnet18(False).cuda()
optim = Adam(model.parameters())losses = AverageMeter()

The model to be used for training is imported from torchvision. The optimization training was done using Adam and AverageMeter() provided in timm was used for the log.

class DataAugmentation(nn.Module):
def __init__(self):
super().__init__()

self.transforms = nn.Sequential(
RandomAffine([0.0, 359.9]),
RandomHorizontalFlip(p=1),
RandomVerticalFlip(p=1)
) @torch.no_grad()
def forward(self, x):
x_out = self.transforms(x) return x_outclass Preprocess(nn.Module):
"""Module to perform pre-process using Kornia on torch tensors.""" @torch.no_grad() # disable gradients for effiency
def forward(self, x) :
x_tmp = np.array(x) # HxWxC
x_out = image_to_tensor(x_tmp, keepdim=True) # CxHxW
return x_out.float() / 255.0

Unlike the transform provided in torchvision, the preprocess requires CUDA, so the image is only converted to tensor. The image can also be resized if needed.

DataAugmentation is a class to proceed with data conversion, and it inherits the nn.Module together with the neural network to use CUDA in PyTorch.

#
transform = Preprocess()
data = CIFAR10('data', transform=transform, download=True)
data_load = DataLoader(data, batch_size=100 )
tran = DataAugmentation()for image, label in tqdm(data_load):
image, label = image.cuda(), label.cuda()
#Starts running here to utilize CUDA
image = tran(image)
optim.zero_grad()

output = model(image)
loss = cross_entropy(output, label)
loss.backward()
losses.update(loss)
optim.step()
print(losses.avg)

Through epoch, you will be able to see that the model has been trained successfully!

4. Conclusion

We can see that a larger batch size equates with higher effectiveness. However, with this advantageous performance indicator in terms of speed, we need to consider the possibility of a spike in GPU resource cost.

Setting these problems aside, it could still be a good option to consider when performing large-scale training. There are more means to utilize it, as there are many more augmentation techniques aside from the function used in the example above. Now that we have learned how to use Kornia through simple examples, it would be effective to shorten the time needed to train datasets by putting them to actual use.

Thank you!

References

https://kornia.readthedocs.io/en/latest/augmentation.html
https://tutorials.pytorch.kr/beginner/blitz/cifar10_tutorial.html
https://pytorch.org/vision/stable/transforms.html
https://arxiv.org/pdf/1710.09412.pdf
https://arxiv.org/pdf/1708.04552.pdf

--

--