Image Augmentations with Albumentations Python Library -Part1

20 min readAug 8, 2024

Image Source Credits:https://github.com/albumentations-team/albumentations_examples/

Deep neural networks need good quality data in large quantity

Deep learning models need good quality training data in large quantities to perform well.Obtaining good quality labelled training data in sufficient quantities is hard. This is particularly hard for image data especially in domains like health care where there may be legal restrictions on sharing of patients data or the available data may be too costly.

Computer vision tasks like image classification, object detection, semantic segmentation involve human labour for image labeling, drawing of bounding boxes around objects or pixel-wise labelling to create labelled image datasets.

Image augmentation is the process of creating new training images by making small changes in the available ones. This includes changes like changing the brightness or contrast of an image, cropping out a portion of image, flipping or rotating the original images etc. Using these transformations to the original labelled images, the number of images can be increased. More quantities of labelled data means less probability of the model overfitting.

Albumentations

Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.

Installation of albumentations

pip install -U albumentations

Code Examples

Let us see the code examples to use the functionality of this library

All the code we see below is from the examples of applying albumentations from its original GitHub repository and API documentation.

Step 1: Importing albumentations and other required libraries in your Jupyter Notebook

import albumentations as A
import matplotlib.pyplot as plt
import cv2

Step 2. Define an augmentation pipeline.

To define an augmentation pipeline, you need to create an instance of the Compose class. As an argument to the Compose class, you need to pass a list of augmentations you want to apply. A call to Compose will return a transform function that will perform image augmentation.

transform = A.Compose([
    A.RandomCrop(width=256, height=256),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.2),
])

To create an augmentation, you create an instance of the required augmentation class and pass augmentation parameters to it. A.RandomCrop receives two parameters, height and width. A.RandomCrop(width=256, height=256) means that A.RandomCrop will take an input image, extract a random patch with size 256 by 256 pixels from it and then pass the result to the next augmentation in the pipeline (in this case to A.HorizontalFlip).

A.HorizontalFlip in this example has one parameter named p. p is a special parameter that is supported by almost all augmentations. It controls the probability of applying the augmentation. p=0.5 means that with a probability of 50%, the transform will flip the image horizontally, and with a probability of 50%, the transform won’t modify the input image.

A.RandomBrighntessContrast in the example also has one parameter, p. With a probability of 20%, this augmentation will change the brightness and contrast of the image received from A.HorizontalFlip. And with a probability of 80%, it will keep the received image unchanged.

Step 3. Read images from the disk.

To pass an image to the augmentation pipeline, you need to read it from the disk. The pipeline expects to receive an image in the form of a NumPy array. If it is a color image, it should have three channels in the following order: Red, Green, Blue (so a regular RGB image).

To read images from the disk, you can use OpenCV — a popular library for image processing. It supports a lot of input formats and is installed along with Albumentations since Albumentations utilizes that library under the hood for a lot of augmentations.

import matplotlib.pyplot as plt
def cv2_imshow(img):
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.show()

image = cv2.imread("lung_mri.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
cv2_imshow(image)

Image Source:https://www.nih.gov/sites/default/files/news-events/news-releases/2019/20191001-nhlbi-lung-th.jpg

To pass an image to the augmentation pipeline you need to call the transform function created by a call to A.Compose at Step 2. In the image argument to that function, you need to pass an image that you want to augment.

#transform will return a dictionary with a single key image. 
#Value at that key will contain an augmented image.
transformed = transform(image=image)
transformed_image = transformed["image"]
cv2_imshow(transformed_image)

Let see some more transformations on above lung mri image

import os
import random
import albumentations as A
import cv2
import numpy as np
from matplotlib import pyplot as plt
from skimage.color import label2rgb


BOX_COLOR = (255, 0, 0) # Red
TEXT_COLOR = (255, 255, 255) # White

#function to visualize bounding box
#The visualization function is based on 
#https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/vis.py
def visualize_bbox(img, bbox, color=BOX_COLOR, thickness=2, **kwargs):
    x_min, y_min, w, h = bbox
    x_min, x_max, y_min, y_max = int(x_min), int(x_min + w), int(y_min), int(y_min + h)
    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), color=color, thickness=thickness)
    return img
#function to display title text
def visualize_titles(img, bbox, title, font_thickness = 2, font_scale=0.35, **kwargs):
    x_min, y_min = bbox[:2]
    x_min = int(x_min)
    y_min = int(y_min)
    ((text_width, text_height), _) = cv2.getTextSize(title, cv2.FONT_HERSHEY_SIMPLEX, 
                                                     font_scale, font_thickness)
    cv2.rectangle(img, (x_min, y_min - int(1.3 * text_height)), 
                  (x_min + text_width, y_min), BOX_COLOR, -1)
    cv2.putText(img, title, (x_min, y_min - int(0.3 * text_height)), 
                cv2.FONT_HERSHEY_SIMPLEX, font_scale, TEXT_COLOR,
                font_thickness, lineType=cv2.LINE_AA)
    return img

#function to apply transforms and display transformed images
def augment_and_show(aug, image, mask=None, bboxes=[], categories=[], 
                     category_id_to_name=[], filename=None,
                     font_scale_orig=0.35, font_scale_aug=0.35, show_title=True, **kwargs):

    if mask is None:
        augmented = aug(image=image, bboxes=bboxes, category_ids=categories)
    else:
        augmented = aug(image=image, mask=mask, bboxes=bboxes, category_ids=categories)

    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_aug = cv2.cvtColor(augmented['image'], cv2.COLOR_BGR2RGB)

    for bbox in bboxes:
        visualize_bbox(image, bbox, **kwargs)

    for bbox in augmented['bboxes']:
        visualize_bbox(image_aug, bbox, **kwargs)

    if show_title:
        for bbox,cat_id in zip(bboxes, categories):
            visualize_titles(image, bbox, category_id_to_name[cat_id],
                             font_scale=font_scale_orig, **kwargs)
        for bbox,cat_id in zip(augmented['bboxes'], augmented['category_ids']):
            visualize_titles(image_aug, bbox, 
                             category_id_to_name[cat_id], font_scale=font_scale_aug, **kwargs)


    if mask is None:
        f, ax = plt.subplots(1, 2, figsize=(16, 8))

        ax[0].imshow(image)
        ax[0].set_title('Original image')

        ax[1].imshow(image_aug)
        ax[1].set_title('Augmented image')
    else:
        f, ax = plt.subplots(2, 2, figsize=(16, 16))

        if len(mask.shape) != 3:
            mask = label2rgb(mask, bg_label=0)
            mask_aug = label2rgb(augmented['mask'], bg_label=0)
        else:
            mask = cv2.cvtColor(mask, cv2.COLOR_BGR2RGB)
            mask_aug = cv2.cvtColor(augmented['mask'], cv2.COLOR_BGR2RGB)

        ax[0, 0].imshow(image)
        ax[0, 0].set_title('Original image')

        ax[0, 1].imshow(image_aug)
        ax[0, 1].set_title('Augmented image')

        ax[1, 0].imshow(mask, interpolation='nearest')
        ax[1, 0].set_title('Original mask')

        ax[1, 1].imshow(mask_aug, interpolation='nearest')
        ax[1, 1].set_title('Augmented mask')

    f.tight_layout()

    if filename is not None:
        f.savefig(filename)

    if mask is None:
        return augmented['image'], None, augmented['bboxes']

    return augmented['image'], augmented['mask'], augmented['bboxes']

#helper function to find a filename in a directory
def find_in_dir(dirname):
    return [os.path.join(dirname, fname) for fname in sorted(os.listdir(dirname))]

image = cv2.imread('lung_mri.jpg')


random.seed(42)

bbox_params = A.BboxParams(format='coco', label_fields=['category_ids'])

light = A.Compose([
    A.RandomBrightnessContrast(p=1),
    A.RandomGamma(p=1),
    A.CLAHE(p=1),
], p=1, bbox_params=bbox_params)

medium = A.Compose([
    A.CLAHE(p=1),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=50, val_shift_limit=50, p=1),
], p=1, bbox_params=bbox_params)


strong = A.Compose([
    A.RGBShift(p=1),
     A.Blur(p=1),
     A.GaussNoise(p=1),
     A.ElasticTransform(p=1),
], p=1, bbox_params=bbox_params)



r = augment_and_show(light, image)

r = augment_and_show(medium, image)

r = augment_and_show(strong, image)

All pixel level transforms implemented in albumentations library

These transforms apply at pixel level to an input image and return any other input targets such as masks, bounding boxes or keypoints unchanged.

AdvancedBlur
Blur
CLAHE
ChannelDropout
ChannelShuffle
ChromaticAberration
ColorJitter
Defocus
Downscale
Emboss
Equalize
FDA
FancyPCA
FromFloat
GaussNoise
GaussianBlur
GlassBlur
HistogramMatching
HueSaturationValue
ISONoise
ImageCompression
InvertImg
MedianBlur
MotionBlur
MultiplicativeNoise
Normalize
PixelDistributionAdaptation
PlanckianJitter
Posterize
RGBShift
RandomBrightnessContrast
RandomFog
RandomGamma
RandomGravel
RandomRain
RandomShadow
RandomSnow
RandomSunFlare
RandomToneCurve
RingingOvershoot
Sharpen
Solarize
Spatter
Superpixels
TemplateTransform
TextImage
ToFloat
ToGray
ToRGB
ToSepia
UnsharpMask
ZoomBlur

Spatial-level transforms

The spatial-level transforms and targets they support are given below. If you try to apply a spatial-level transform to an unsupported target, Albumentations will raise an error.

Mixing-level transforms

Transforms that mix several images into one

Let us see above transformation in implementation.

## Simple Augmentations
import random
import cv2
from matplotlib import pyplot as plt
import albumentations as A


def visualize(image):
    plt.figure(figsize=(10, 10))
    plt.axis('off')
    plt.imshow(image)


image = cv2.imread('dog_image.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
visualize(image)

Define a single augmentation, pass the image to it and receive the augmented image We fix the random seed for visualization purposes, so the augmentation will always produce the same result. In a real computer vision pipeline, you shouldn’t fix the random seed before applying a transform to the image because, in that case, the pipeline will always output the same image. The purpose of image augmentation is to use different transformations each time.

transform = A.HorizontalFlip(p=0.5)
random.seed(7)
augmented_image = transform(image=image)['image']
visualize(augmented_image)

transform = A.ShiftScaleRotate(p=0.5)
random.seed(7)
augmented_image = transform(image=image)['image']
visualize(augmented_image)

#Define an augmentation pipeline using Compose, pass the image to it and receive the augmented image

transform = A.Compose([
    A.CLAHE(),
    A.RandomRotate90(),
    A.Transpose(),
    A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.50, rotate_limit=45, p=.75),
    A.Blur(blur_limit=3),
    A.OpticalDistortion(),
    A.GridDistortion(),
    A.HueSaturationValue(),
])
random.seed(42)
augmented_image = transform(image=image)['image']
visualize(augmented_image)

transform = A.Compose([
        A.RandomRotate90(),
        A.Flip(),
        A.Transpose(),
        A.GaussNoise(),
        A.OneOf([
            A.MotionBlur(p=.2),
            A.MedianBlur(blur_limit=3, p=0.1),
            A.Blur(blur_limit=3, p=0.1),
        ], p=0.2),
        A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, rotate_limit=45, p=0.2),
        A.OneOf([
            A.OpticalDistortion(p=0.3),
            A.GridDistortion(p=.1),
        ], p=0.2),
        A.OneOf([
            A.CLAHE(clip_limit=2),
            A.RandomBrightnessContrast(),
        ], p=0.3),
        A.HueSaturationValue(p=0.3),
    ])
random.seed(42)
augmented_image = transform(image=image)['image']
visualize(augmented_image)

Weather related transformation implemented in albumentations

import random 
import cv2 
from matplotlib import pyplot as plt 
import albumentations as A


def visualize(image):
    #image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(20, 10))
    plt.axis('off')
    plt.imshow(image)

image = cv2.imread('weather_example.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
visualize(image)

RandomRain

We fix the random seed for visualization purposes, so the augmentation will always produce the same result. In a real computer vision pipeline, you shouldn’t fix the random seed before applying a transform to the image because, in that case, the pipeline will always output the same image. The purpose of image augmentation is to use different transformations each time.

transform = A.Compose([A.RandomRain(brightness_coefficient=0.9, drop_width=1, blur_value=5, p=1)],)
#random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

#RandomSnow

transform = A.Compose(
    [A.RandomSnow(brightness_coeff=2.5, snow_point_lower=0.3, snow_point_upper=0.5, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

#RandomSunFlare

transform = A.Compose(
    [A.RandomSunFlare(flare_roi=(0, 0, 1, 0.5), angle_lower=0.5, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

#RandomShadow

transform = A.Compose(
    [A.RandomShadow(num_shadows_lower=1, num_shadows_upper=1, shadow_dimension=5, shadow_roi=(0, 0.5, 1, 1), p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

#RandomFog

transform = A.Compose(
    [A.RandomFog(fog_coef_lower=0.7, fog_coef_upper=0.8, alpha_coef=0.1, p=1)],
)
random.seed(7)
transformed = transform(image=image)
visualize(transformed['image'])

Working with non-8 bit images in albumentation

e.g.16-bit TIFF images. 16-bit images are used in satellite imagery. The following technique can be applied to all non-8-bit images (i.e., 24-bit images, 32-bit images. etc.).

import random
import cv2
from matplotlib import pyplot as plt
import albumentations as A


#visualize function will be a bit different for TIFF images
import numpy as np
def visualize(image):
    # Divide all values by 65535 so we can display the image using matplotlib
    image = image / np.max(image)
    plt.figure(figsize=(10, 10))
    plt.axis('off')
    plt.imshow(image)


#Read the 16-bit TIFF image from the disk
# The image is taken from http://www.brucelindbloom.com/index.html?ReferenceImages.html
# © Bruce Justin Lindbloom
image = cv2.imread('DeltaE_16bit_gamma2.2.tif', cv2.IMREAD_UNCHANGED)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
visualize(image)

transform = A.Compose([
    A.ToFloat(max_value=65535.0),

    A.RandomRotate90(),
    A.Flip(),
    A.OneOf([
        A.MotionBlur(p=0.2),
        A.MedianBlur(blur_limit=3, p=0.1),
        A.Blur(blur_limit=3, p=0.1),
    ], p=0.2),
    A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.2, 
                       rotate_limit=45, p=0.2),
    A.OneOf([
        A.OpticalDistortion(p=0.3),
        A.GridDistortion(p=0.1),
    ], p=0.2),
    A.HueSaturationValue(hue_shift_limit=20, sat_shift_limit=0.1, 
                         val_shift_limit=0.1, p=0.3),

    A.FromFloat(max_value=65535.0),
])

random.seed(7)
augmented = transform(image=image)
visualize(augmented['image'])

Augmenting text within document images (Overlay elements)

The pipeline processes TIFF images and associated JSON files that describe text lines and their bounding boxes within the document images https://github.com/danaaubakirova/doc-augmentation

import cv2
from matplotlib import pyplot as plt
from PIL import ImageDraw, ImageFont, Image
from pylab import *
import albumentations as A
import json


def visualize(image):
    plt.figure(figsize=(20, 10))
    plt.axis('off')
    plt.imshow(image)

def load_rgb(image_path):
    image = cv2.imread(image_path)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

font_path = "LiberationSerif-Regular.ttf"
image = load_rgb("docs.png")

with open("text.json") as f:
    labels = json.load(f)
visualize(image)

transform = A.Compose([A.OverlayElements(p=1)])


def render_text(bbox_shape, text, font):
    bbox_height, bbox_width = bbox_shape

    # Create an empty RGB image with the size of the bounding box
    bbox_img = Image.new("RGB", (bbox_width, bbox_height), color="white")
    draw = ImageDraw.Draw(bbox_img)

    # Draw the text in red
    draw.text((0, 0), text, fill="red", font=font)

    return np.array(bbox_img)




bbox_indices_to_update = np.random.choice(range(len(labels["text"])), 10)
labels.keys()

image_height, image_width = image.shape[:2]
num_channels = image.shape[2] if len(image.shape) == 3 else 1


metadata = []
for index in bbox_indices_to_update:
    selected_bbox = labels["bbox"][index]

    # You may apply any transforms you want to text like 
    #random deletion, swapping words, applying synonims, etc
    text = labels["text"][index]

    left, top, width_norm, height_norm = selected_bbox

    bbox_height = int(image_height * height_norm)
    bbox_width = int(image_width * width_norm)

    font = ImageFont.truetype(font_path, int(0.90 * bbox_height))

    overlay_image = render_text((bbox_height, bbox_width), text, font)

    metadata += [
        {
            "image": overlay_image,
            "bbox": (left, top, left + width_norm, top + height_norm)
        }
    ]


transformed = transform(image=image, overlay_metadata=metadata)
visualize(transformed["image"])

transform_complex = A.Compose([A.OverlayElements(p=1),
                               A.RandomCrop(p=1, height=1024, width=1024),
                               A.PlanckianJitter(p=1),
                               A.Affine(p=1)
                              ])
transformed = transform_complex(image=image, overlay_metadata=metadata)
visualize(transformed["image"])

Bounding Box Keypoint Rotation with albumentation

from typing import List
import albumentations as A
import cv2
import matplotlib.pyplot as plt
import numpy as np


def visualize(image: np.ndarray, keypoints: List[List[float]], 
              bboxes: List[List[float]]) -> np.ndarray:
    overlay = image.copy()
    for kp in keypoints:
        cv2.circle(overlay, (int(kp[0]), int(kp[1])), 20, 
                   (0, 200, 200), thickness=2, lineType=cv2.LINE_AA)

    for box in bboxes:
        cv2.rectangle(overlay, (int(box[0]), int(box[1])), 
                      (int(box[2]), int(box[3])), (200, 0, 0), thickness=2)

    return overlay


def main() -> None:
    image = cv2.imread("image_1.jpg")

    keypoints = cv2.goodFeaturesToTrack(
        cv2.cvtColor(image, cv2.COLOR_RGB2GRAY), maxCorners=100, 
        qualityLevel=0.5, minDistance=5
    ).squeeze(1)

    bboxes = [(kp[0] - 10, kp[1] - 10, kp[0] + 10, kp[1] + 10) for kp in keypoints]

    disp_image = visualize(image, keypoints, bboxes)
    plt.figure(figsize=(10, 10))
    plt.imshow(cv2.cvtColor(disp_image, cv2.COLOR_RGB2BGR))
    plt.tight_layout()
    plt.show()

    aug = A.Compose(
        [A.ShiftScaleRotate(scale_limit=0.1, shift_limit=0.2, 
                            rotate_limit=10, always_apply=True)],
        bbox_params=A.BboxParams(format="pascal_voc", label_fields=["bbox_labels"]),
        keypoint_params=A.KeypointParams(format="xy"),
    )

    for _i in range(10):
        data = aug(image=image, keypoints=keypoints, bboxes=bboxes, 
                   bbox_labels=np.ones(len(bboxes)))

        aug_image = data["image"]
        aug_image = visualize(aug_image, data["keypoints"], data["bboxes"])

        plt.figure(figsize=(10, 10))
        plt.imshow(cv2.cvtColor(aug_image, cv2.COLOR_RGB2BGR))
        plt.tight_layout()
        plt.show()


if __name__ == "__main__":
    main()

Drawing Bounding Boxes

import random
import cv2
from matplotlib import pyplot as plt
import albumentations as A

The visualization function is based on https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/vis.py

BOX_COLOR = (255, 0, 0) # Red
TEXT_COLOR = (255, 255, 255) # White


def visualize_bbox(img, bbox, class_name, color=BOX_COLOR, thickness=2):
    """Visualizes a single bounding box on the image"""
    x_min, y_min, w, h = bbox
    x_min, x_max, y_min, y_max = int(x_min), int(x_min + w), int(y_min), int(y_min + h)

    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), color=color, thickness=thickness)

    ((text_width, text_height), _) = cv2.getTextSize(class_name, cv2.FONT_HERSHEY_SIMPLEX, 0.35, 1)
    cv2.rectangle(img, (x_min, y_min - int(1.3 * text_height)), 
                  (x_min + text_width, y_min), BOX_COLOR, -1)
    cv2.putText(
        img,
        text=class_name,
        org=(x_min, y_min - int(0.3 * text_height)),
        fontFace=cv2.FONT_HERSHEY_SIMPLEX,
        fontScale=0.35,
        color=TEXT_COLOR,
        lineType=cv2.LINE_AA,
    )
    return img


def visualize(image, bboxes, category_ids, category_id_to_name):
    img = image.copy()
    for bbox, category_id in zip(bboxes, category_ids):
        class_name = category_id_to_name[category_id]
        img = visualize_bbox(img, bbox, class_name)
    plt.figure(figsize=(12, 12))
    plt.axis('off')
    plt.imshow(img)

For this example we will use an image from the COCO dataset that have two associated bounding boxes. The image is available at http://cocodataset.org/#explore?id=386298

image = cv2.imread('000000386298.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Define two bounding boxes with coordinates and class labels

Coordinates for those bounding boxes are declared using the coco format. Each bounding box is described using four values [x_min, y_min, width, height]. For the detailed description of different formats for bounding boxes coordinates, please refer to the documentation article about bounding boxes — https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/.

bboxes = [[5.66, 138.95, 147.09, 164.88], [366.7, 80.84, 132.8, 181.84]]
category_ids = [17, 18]

# We will use the mapping from category_id to the class name
# to visualize the class label for the bounding box on the image
category_id_to_name = {17: 'cat', 18: 'dog'}


#Visuaize the original image with bounding boxes
visualize(image, bboxes, category_ids, category_id_to_name)

Define an augmentation pipeline

To make an augmentation pipeline that works with bounding boxes, you need to pass an instance of BboxParams to Compose. In BboxParams you need to specify the format of coordinates for bounding boxes and optionally a few other parameters. For the detailed description of BboxParams please refer to the documentation article about bounding boxes — https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/.

transform = A.Compose(
    [A.HorizontalFlip(p=0.5)],
    bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),
)



random.seed(7)
transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
visualize(
    transformed['image'],
    transformed['bboxes'],
    transformed['category_ids'],
    category_id_to_name,
)

transform = A.Compose(
    [A.ShiftScaleRotate(p=0.5)],
    bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),
)

random.seed(7)
transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
visualize(
    transformed['image'],
    transformed['bboxes'],
    transformed['category_ids'],
    category_id_to_name,
)

#Define a complex augmentation piepline

transform = A.Compose([
        A.HorizontalFlip(p=0.5),
        A.ShiftScaleRotate(p=0.5),
        A.RandomBrightnessContrast(p=0.3),
        A.RGBShift(r_shift_limit=30, g_shift_limit=30, b_shift_limit=30, p=0.3),
    ],
    bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),
)

random.seed(7)
transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
visualize(
    transformed['image'],
    transformed['bboxes'],
    transformed['category_ids'],
    category_id_to_name,
)

min_area and min_visibility parameters

The size of bounding boxes could change if you apply spatial augmentations, for example, when you crop a part of an image or when you resize an image.

min_area and min_visibility parameters control what Albumentations should do to the augmented bounding boxes if their size has changed after augmentation. The size of bounding boxes could change if you apply spatial augmentations, for example, when you crop a part of an image or when you resize an image.

min_area is a value in pixels. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. So the returned list of augmented bounding boxes won’t contain that bounding box.

min_visibility is a value between 0 and 1. If the ratio of the bounding box area after augmentation to the area of the bounding box before augmentation becomes smaller than min_visibility, Albumentations will drop that box. So if the augmentation process cuts the most of the bounding box, that box won’t be present in the returned list of the augmented bounding boxes.

Define an augmentation pipeline with the default values for min_area and min_visibilty

If you don’t pass the min_area and min_visibility parameters, Albumentations will use 0 as a default value for them.

transform = A.Compose(
    [A.CenterCrop(height=280, width=280, p=1)],
    bbox_params=A.BboxParams(format='coco', label_fields=['category_ids']),
)

transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
visualize(
    transformed['image'],
    transformed['bboxes'],
    transformed['category_ids'],
    category_id_to_name,
)

Chromatic abberations

import cv2
from matplotlib import pyplot as plt
import cv2

import albumentations as A

def visualize(image):
    plt.figure(figsize=(10, 5))
    plt.axis('off')
    plt.imshow(image)

def load_rgb(image_path):
    image = cv2.imread(image_path)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)


img_path = "alina_rossoshanska.jpeg"
img = load_rgb(img_path)
visualize(img)

#Red-blue mode

transform = A.Compose([A.ChromaticAberration(mode="red_blue", 
                                             primary_distortion_limit=0.5, 
                                             secondary_distortion_limit=0.1, p=1)])

plt.figure(figsize=(15, 10))

num_images = 12

# Loop through the list of images and plot them with subplot
for i in range(num_images):
    transformed_image = transform(image = img)["image"]
    plt.subplot(4, 3, i + 1)
    plt.imshow(transformed_image)
    plt.axis('off')

plt.tight_layout()
plt.show()

D4 Transform

Geomatric transforms are the most widely used augmentations. Mainly becase they do not get data outside of the original data distribution and because they make intuitive sens”.

D4 transform maps orignal image to one of 8 states.

e — identity. The original image
r90 — rotation by 90 degrees
r180 — rotation by 180 degrees, which is equal to v * h = h * v
r270 — rotation by 270 degrees
v — vertical flip
hvt — reflection across anti diagonal, which is equal to t * v * h or t * rot180
h — horizonal flip
t — reflection actoss the diagonal

The same transform could be represented more conveniently as:

A.Compose([A.HorizonatalFlip(p=0.5), A.RandomRotate90(p=1)])

This transform is useful in situations where imagery data does not have preferred orientation:

For example: medical images, top view drone and satellite imagery. It Works for: image, mask, keypoints, bounding boxes

import json
import hashlib
import random
import numpy as np
import cv2
from matplotlib import pyplot as plt
import albumentations as A



BOX_COLOR = (255, 0, 0)
TEXT_COLOR = (255, 255, 255)
KEYPOINT_COLOR = (0, 255, 0)

def visualize_bbox(img, bbox, class_name, bbox_color=BOX_COLOR, thickness=1):
    """Visualizes a single bounding box on the image"""
    x_min, y_min, x_max, y_max = (int(x) for x in bbox)
    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), 
                  color=bbox_color, thickness=thickness)

    ((text_width, text_height), _) = cv2.getTextSize(class_name, 
                                                     cv2.FONT_HERSHEY_SIMPLEX, 0.35, 1)
    cv2.rectangle(img, (x_min, y_min - int(1.3 * text_height)), 
                  (x_min + text_width, y_min), bbox_color, -1)
    cv2.putText(
        img,
        text=class_name,
        org=(x_min, y_min - int(0.3 * text_height)),
        fontFace=cv2.FONT_HERSHEY_SIMPLEX,
        fontScale=0.35,
        color=TEXT_COLOR,
        lineType=cv2.LINE_AA,
    )
    return img

def vis_keypoints(image, keypoints, color=KEYPOINT_COLOR, diameter=3):
    image = image.copy()
    for (x, y) in keypoints:
        cv2.circle(image, (int(x), int(y)), diameter, color, -1)
    return image

def visualize_one(image, bboxes, keypoints, category_ids, category_id_to_name, mask):
    # Create a copy of the image to draw on
    img = image.copy()

    # Apply each bounding box and corresponding category ID
    for bbox, category_id in zip(bboxes, category_ids):
        class_name = category_id_to_name[category_id]
        img = visualize_bbox(img, bbox, class_name)

    # Apply keypoints if provided
    if keypoints:
        img = vis_keypoints(img, keypoints)

    # Setup plot
    fig, ax = plt.subplots(1, 2, figsize=(6, 3))

    # Show the image with annotations
    ax[0].imshow(img)    
    ax[0].axis('off')

    # Show the mask
    ax[1].imshow(mask, cmap='gray')    
    ax[1].axis('off')

    plt.tight_layout()
    plt.show()

     

def visualize(images, bboxes_list, keypoints_list, 
              category_ids_list, category_id_to_name, masks):
    if len(images) != 8:
        raise ValueError("This function is specifically designed to handle exactly 8 images.")

    num_rows = 4
    num_cols = 4
    
    fig, axs = plt.subplots(num_cols, num_rows, figsize=(20, 20)) 

    for idx, (image, bboxes, keypoints, category_ids, mask) in enumerate(zip(images,
                                                                             bboxes_list, 
                                                                             keypoints_list, 
                                                                             category_ids_list, 
                                                                             masks)):
        img = image.copy()

        # Process each image: draw bounding boxes and keypoints
        for bbox, category_id in zip(bboxes, category_ids):
            class_name = category_id_to_name[category_id]
            img = visualize_bbox(img, bbox, class_name)

        if keypoints:
            img = vis_keypoints(img, keypoints)

        # Calculate subplot indices
        row_index = (idx * 2) // num_rows  # Each pair takes two columns in one row
        col_index_image = (idx * 2) % num_cols  # Image at even index
        col_index_mask = (idx * 2 + 1) % num_cols  # Mask at odd index right after image

        # Plot the processed image
        img_ax = axs[row_index, col_index_image]
        img_ax.imshow(img)        
        img_ax.axis('off')

        # Plot the corresponding mask
        mask_ax = axs[row_index, col_index_mask]
        mask_ax.imshow(mask, cmap='gray')        
        mask_ax.axis('off')

    plt.tight_layout()
    plt.show()

with open("road_labels.json") as f:
    labels = json.load(f)

bboxes = labels["bboxes"]
keypoints = labels["keypoints"]

bgr_image = cv2.imread("road.jpeg")
image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

mask = cv2.imread("road.png", 0)

# In this example we use only one class, hence category_ids is list equal 
#to the number of bounding boxes with only one value
category_ids = [1] * len(labels["bboxes"])
category_id_to_name = {1: "car"}

visualize_one(image, bboxes, keypoints, category_ids, category_id_to_name, mask)

transform = A.Compose([
    A.CenterCrop(height=512, width=256, p=1),
    A.D4(p=1)],
                     bbox_params=A.BboxParams(format='pascal_voc', 
                                              label_fields=['category_ids']),
                     keypoint_params=A.KeypointParams(format='xy'))

transformed = transform(image=image, bboxes=bboxes, 
                        category_ids=category_ids, keypoints=keypoints, mask=mask)

def get_hash(image):
    image_bytes = image.tobytes()
    hash_md5 = hashlib.md5()
    hash_md5.update(image_bytes)    
    return hash_md5.hexdigest()

transformations_dict = {}

for _ in range(80):
    transformed = transform(image=image, bboxes=bboxes, 
                            category_ids=category_ids, keypoints=keypoints, mask=mask)
    image_hash = get_hash(transformed["image"])
    
    if image_hash in transformations_dict:
        transformations_dict[image_hash]['count'] += 1
    else:
        transformations_dict[image_hash] = {
            "count": 1,
            "transformed": transformed
        }

#The transform generates all 8 possible variants with the same probability, including identity transform
len(transformations_dict)

for key in transformations_dict:
    print(key, transformations_dict[key]["count"])

transformed_list = [value["transformed"] for value in transformations_dict.values()]



images = [x["image"] for x in transformed_list]
masks = [x["mask"] for x in transformed_list]
bboxes_list = [x["bboxes"] for x in transformed_list]
keypoints_list = [x["keypoints"] for x in transformed_list]


category_ids_list = [[1] * len(x["bboxes"]) for x in transformed_list]
category_id_to_name = {1: "car"}


visualize(images, bboxes_list, keypoints_list, 
          category_ids_list, category_id_to_name, masks)

Morphological Transform

import random

import cv2
from matplotlib import pyplot as plt

import albumentations as A

def visualize(image):
    plt.figure(figsize=(10, 5))
    plt.axis('off')
    plt.imshow(image)

def load_rgb(image_path):
    image = cv2.imread(image_path)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)


img_path = "scan.jpeg"
img = load_rgb(img_path)
visualize(img)

#Dilation expands the white (foreground) regions in a binary or grayscale image.
transform = A.Compose([A.Morphological(p=1, scale=(2, 3), operation='dilation')], p=1)
transformed = transform(image=img)
visualize(transformed["image"])

#Erosion shrinks the white (foreground) regions in a binary or grayscale image.
transform = A.Compose([A.Morphological(p=1, scale=(2, 3), operation='erosion')], p=1)
transformed = transform(image=img)
visualize(transformed["image"])

Domain Adaptation Transform

DAT is style transfer without the use of a neural network. Resulting images do not look as ideal but could be generated on the fly in a reasonable time.

import random

import cv2
from matplotlib import pyplot as plt
from pathlib import Path
import numpy as np
import cv2

import albumentations as A

def visualize(image):
    plt.figure(figsize=(10, 5))
    plt.axis('off')
    plt.imshow(image)

def load_rgb(image_path):
    image = cv2.imread(image_path)
    return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

image1 = load_rgb("park1.png")
image2 = load_rgb("rain2.png")

visualize(image1)

visualize(image2)

Historgram matching

This process adjusts the pixel values of an input image to align its histogram with that of a reference image. When dealing with multi-channel images, this alignment occurs separately for each channel, provided that both the input and the reference image have an identical number of channels.

transform = A.Compose([A.HistogramMatching(reference_images=[image2], 
                                           read_fn = lambda x: x, 
                                           p=1,
                                           blend_ratio=(0.3, 0.3)
                                          )], p=1)
     
transformed = transform(image=image1)["image"]
visualize(transformed)

Fourier Domain Adaptation (FDA)

Fourier Domain Adaptation (FDA) for simple “style transfer” in the context of unsupervised domain adaptation (UDA). FDA manipulates the frequency components of images to reduce the domain gap between source and target datasets, effectively adapting images from one domain to closely resemble those from another without altering their semantic content.

This transform is particularly beneficial in scenarios where the training (source) and testing (target) images come from different distributions, such as synthetic versus real images, or day versus night scenes. Unlike traditional domain adaptation methods that may require complex adversarial training, FDA achieves domain alignment by swapping low-frequency components of the Fourier transform between the source and target images. This technique has shown to improve the performance of models on the target domain, particularly for tasks like semantic segmentation, without additional training for domain invariance.

transform = A.Compose([A.FDA(reference_images=[image1], 
                             read_fn = lambda x: x, p=1, beta_limit=(0.2, 0.2))], p=1)
transformed = transform(image=image1)["image"]
visualize(transformed)

PixelDistribution

Performs pixel-level domain adaptation by aligning the pixel value distribution of an input image with that of a reference image. This process involves fitting a simple statistical (such as PCA, StandardScaler, or MinMaxScaler) to both the original and the reference images, transforming the original image with the transformation trained on it, and then applying the inverse transformation using the transform fitted on the reference image. The result is an adapted image that retains the original content while mimicking the pixel value distribution of the reference domain.

The process can be visualized as two main steps:

Adjusting the original image to a standard distribution space using a selected transform.
Moving the adjusted image into the distribution space of the reference image by applying the inverse of the transform fitted on the reference image.

This technique is especially useful in scenarios where images from different domains (e.g., synthetic vs. real images, day vs. night scenes) need to be harmonized for better consistency or performance in image processing tasks.

transform = A.Compose([A.PixelDistributionAdaptation(reference_images=[image1], 
                                                     read_fn = lambda x: x, p=1,  transform_type="pca")], p=1)

transformed = transform(image=image1)["image"]
visualize(transformed)

beta = 0.1

height, width = image1.shape[:2]
border = int(np.floor(min(height, width) * beta))
center_y, center_x = height // 2, width // 2

# Define region for amplitude substitution
y1, y2 = center_y - border, center_y + border

x1, x2 = center_x - border, center_x + border

x1, x2, y1, y2
     
(205, 307, 205, 307)

def low_freq_mutate_np(amp_src, amp_trg, L, h, w):
    b = int(np.floor(min(h, w) * L))
    c_h, c_w = h // 2, w // 2
    h1, h2 = max(0, c_h - b), min(c_h + b, h - 1)
    w1, w2 = max(0, c_w - b), min(c_w + b, w - 1)
    amp_src[h1:h2, w1:w2] = amp_trg[h1:h2, w1:w2]
    return amp_src

def fourier_domain_adaptation(src_img, trg_img, beta=0.1):
    assert src_img.shape == trg_img.shape, "Source and target images must have the same shape."
    src_img = src_img.astype(np.float32)
    trg_img = trg_img.astype(np.float32)

    height, width, num_channels = src_img.shape
    
    # Prepare container for the output image
    src_in_trg = np.zeros_like(src_img)
    
    for c in range(num_channels):
        # Perform FFT on each channel
        fft_src = np.fft.fft2(src_img[:, :, c])
        fft_trg = np.fft.fft2(trg_img[:, :, c])
        
        # Shift the zero frequency component to the center
        fft_src_shifted = np.fft.fftshift(fft_src)
        fft_trg_shifted = np.fft.fftshift(fft_trg)
        
        # Extract amplitude and phase
        amp_src, pha_src = np.abs(fft_src_shifted), np.angle(fft_src_shifted)
        amp_trg = np.abs(fft_trg_shifted)
        
        # Mutate the amplitude part of the source with the target
        
        mutated_amp = low_freq_mutate_np(amp_src.copy(), amp_trg, beta, height, width)
        
        # Combine the mutated amplitude with the original phase
        fft_src_mutated = np.fft.ifftshift(mutated_amp * np.exp(1j * pha_src))
        
        # Perform inverse FFT
        src_in_trg_channel = np.fft.ifft2(fft_src_mutated)
        
        # Store the result in the corresponding channel of the output image
        src_in_trg[:, :, c] = np.real(src_in_trg_channel)
        
    return np.clip(src_in_trg, 0, 255)

visualize(fourier_domain_adaptation(image2, image1, 0.01).astype(np.uint8))

RandomGridShuffle

This transformation divides the image into a grid and then permutes these grid cells based on a random mapping.

It could be useful when only micro features are important for the model, and memorizing the global structure could be harmful.

For example:

Identifying the type of cell phone used to take a picture based on micro artifacts generated by phone post-processing algorithms, rather than the semantic features of the photo.
Identifying stress, glucose, hydration levels based on skin images.

import random
import numpy as np
import cv2
from matplotlib import pyplot as plt
import albumentations as A
import json

KEYPOINT_COLOR = (0, 255, 0)

def vis_keypoints(image, keypoints, color=KEYPOINT_COLOR, diameter=3):
    image = image.copy()
    for (x, y) in keypoints:
        cv2.circle(image, (int(x), int(y)), diameter, color, -1)
    return image

def visualize(image, mask, keypoints):
    # Create a copy of the image to draw on
    img = image.copy()

    # Apply keypoints if provided
    if keypoints:
        img = vis_keypoints(img, keypoints)

    # Setup plot
    fig, ax = plt.subplots(1, 2, figsize=(10, 5))

    # Show the image with annotations
    ax[0].imshow(img)    
    ax[0].axis('off')

    # Show the mask
    ax[1].imshow(mask, cmap='gray')    
    ax[1].axis('off')

    plt.tight_layout()
    plt.show()




with open("road_labels.json") as f:
    labels = json.load(f)

keypoints = labels["keypoints"]

bgr_image = cv2.imread("road.jpeg")
image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

mask = cv2.imread("road.png", 0)

visualize(image, mask, keypoints)

transform = A.Compose([A.RandomGridShuffle(grid=(2, 2), p=1)],
                      keypoint_params=A.KeypointParams(format='xy'))
transformed = transform(image=image, keypoints=keypoints, mask=mask)
visualize(transformed["image"], transformed["mask"], transformed["keypoints"])

transform = A.Compose([A.RandomGridShuffle(grid=(3, 3), p=1)], 
                      keypoint_params=A.KeypointParams(format='xy'))
transformed = transform(image=image, keypoints=keypoints, mask=mask)
visualize(transformed["image"], transformed["mask"], transformed["keypoints"])

transform = A.Compose([A.RandomGridShuffle(grid=(5, 7), p=1)],
                      keypoint_params=A.KeypointParams(format='xy'))
transformed = transform(image=image, keypoints=keypoints, mask=mask)
visualize(transformed["image"], transformed["mask"], transformed["keypoints"])