Published in

MuarAugment: Easiest Way to SOTA Data Augmentation

All images are created by the author.

I wanted an easy way to get a state-of-the-art image augmentation pipeline with no manual iteration, no separate models to train and no thinking. To provide that, I created MuarAugment (Model Uncertainty- And Randomness-based Augmentation), a GPU-supported Python package built on Pytorch, Albumentations and Kornia.

This article guides you through incorporating MuarAugment into your computer vision pipelines.

There are a few resources you can use to master MuarAugment. There are Colab tutorials demonstrating MuarAugment. Most of the material in this article comes from those.

  • Here is an end-to-end classification task using MuarAugment’s algorithm MuAugment on colab. (1)
  • Here is a template to use MuarAugment’s implementation of RandAugment on colab. (2)
  • Also see my previous Medium post for a survey of automatic data augmentation methods and the motivation for MuarAugment, as well as a walk through the RandAugment and MuarAugment algorithms.
  • The MuarAugment GitHub contains the code and additional resources (give a star if and only if you find MuarAugment useful).

High-level Walkthrough

MuarAugment adapts the leading pipeline search algorithms, RandAugment[1] and the model uncertainty-based augmentation scheme[2] (called MuAugment here), and modifies them to work batch-wise, on the GPU. Kornia[3] and Albumentations are used for batch-wise and item-wise transformations respectively.

Let’s get into MuarAugment’s use.

# You can install MuarAugment via PIP:!pip install MuarAugment


MuAugmentis the heavy-duty automatic data augmentation policy in MuarAugment. It is simpler than GANs-based automatic data augmentation pipelines, but more complex than RandAugment or manual augmentation.

MuAugment’s training is puzzlingly unstable on the validation set (it is monotonously decreasing on the train loss), but it ends up performing significantly better than competing methods.

Record of training vanilla ResNet50s on the Kaggle Plant Dataset.

For MuAugment, simply modify the training logic and train like normal.

In PyTorch Lightning:

In pure PyTorch:

See the colab notebook tutorials on MuAugment (1) for more detail.


MuarAugment also provides a straightforward implementation of RandAugment, a simple but effective automatic data augmentation policy, using Albumentations:

See the colab notebook tutorial (#2) for more detail on AlbumentationsRandAugment.


I will use MuarAugment when training neural nets that input images, using MuAugment for image classification and RandAugment for other tasks. (Currently, I am using AlubumentationsRandAugment in a generative model composed of transformers.) I hope you’ll find this package to be as useful as I do.

Papers Referenced

1. Cubuk, Ekin et al. “RandAugment: Practical data augmentation with no separate search,” 2019, [arXiv](
2. Wu, Sen et al. “On the Generalization Effects of Linear Transformations in Data Augmentation,” 2020, [arXiv](
3. Riba, Edgar et al. “Kornia: an Open Source Differentiable Computer Vision Library for PyTorch,” 2019, [arXiv](




Data Scientists must think like an artist when finding a solution when creating a piece of code. ⚪️ Artists enjoy working on interesting problems, even if there is no obvious answer ⚪️ 🔵 Follow to join our 18K+ Unique DAILY Readers 🟠

Recommended from Medium

CDS PhD Student Presents on Transfer Learning in NLP

Convolutional Neural Networks — week 4

How to Run GPU in Jupyter Notebook

Data Science interview questions

Production ML Systems on GCP

How to approach structured data extraction from images

Learning to generate videos with uncertain futures

Automatic Data Augmentation: an Overview and the SOTA

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Mehdi

Adam Mehdi

Creative in science & tech. I think about AI & epistemology. Math & CE Undergrad, Assistant Researcher at Dartmouth.

More from Medium

Why Multi-label classification should be used instead of conventional classifiers.

Train a neural net for semantic segmentation in 50 lines of code, with Pytorch

Implementation Stage Of Tanh Activation Function

How to load multiple images and processing them?