YOLOv9: 1 Channel Training

Train YOLO on One Channel

Oliver Lövström
Internet of Technology
4 min readMar 30, 2024

--

This step-by-step guide works for all YOLO models, including YOLOv8 and YOLOv9. Let’s get started!

Photo by Aakash Dhage on Unsplash

If you're working with grayscale images, there is no need to include 3 input channels in the model. By default, YOLO doesn’t support 1 channel input for training, so let’s update the code. We’ll be working directly in the Ultralytics directory. So make sure to clone the GitHub repository:

git clone https://github.com/ultralytics/ultralytics.git

One Channel Training

First, modify the load_image() function in ultralytics/data/base.py:

def load_image(self, i, rect_mode=True):
"""Loads 1 image from dataset index 'i', returns (im, resized hw)."""
im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i]
if im is None: # not cached in RAM
if fn.exists(): # load npy
try:
im = np.load(fn)
except Exception as e:
LOGGER.warning(f"{self.prefix}WARNING ⚠️ Removing corrupt *.npy image file {fn} due to: {e}")
Path(fn).unlink(missing_ok=True)
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
else:
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
if im is None:
raise FileNotFoundError(f"Image Not Found {f}")
...

Continue by modifying the code in ultralytics/data/dataset.py, starting with the class ClassificationDataset. Add self.ch to the __init__() and modify the __getitem__() function:

def __init__(self, root, args, augment=False, prefix=""):
super().__init__(root=root)
self.ch = 1 # Add this line of code
...


def __getitem__(self, i):
"""Returns subset of data and targets corresponding to given indices."""
f, j, fn, im = self.samples[i]
if self.cache_ram and im is None:
# im = self.samples[i][3] = cv2.imread(f) Replace with the code below
im = self.samples[i][3] = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
elif self.cache_disk:
if not fn.exists():
np.save(fn.as_posix(), cv2.imread(f), allow_pickle=False)
im = np.load(fn)
else:
# im = cv2.imread(f) Replace with the code below
im = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
im = Image.fromarray(cv2.cvtColor(im, cv2.COLOR_BGR2RGB))
sample = self.torch_transforms(im)
return {"img": sample, "cls": j}

Still in ultralytics/data/dataset.py go to the class YOLODataset and the specific build_transforms() function. Add one line of code:

def build_transforms(self, hyp=None):
"""Builds and appends transforms to the list."""
self.augment = False # Add this line of code
...

Go to ultralytics/data/augment.py and the class Format. Update the _format_img() function:

def _format_img(self, img):
"""Format the image for YOLO from Numpy array to PyTorch tensor."""
# Update the lines in this if-statement
if len(img.shape) < 3:
img = img.reshape([1, *img.shape])
img = np.ascontiguousarray(img)
img = torch.from_numpy(img)
return img
img = img.transpose(2, 0, 1)
img = np.ascontiguousarray(img[::-1] if random.uniform(0, 1) > self.bgr else img)
img = torch.from_numpy(img)
return img

Update YAML

Both the training and model configuration files must include ch: 1.

Training configuration:

# train.yaml
ch: 1 # Add ch: 1
path: /path/to/data
train: train
val: val

names:
0: hand

Model configuration:

# yolov9c.yaml
nc: 80
ch: 1

backbone:
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]] # 2
- [-1, 1, ADown, [256]] # 3-P3/8
- [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]] # 4
- [-1, 1, ADown, [512]] # 5-P4/16
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 6
- [-1, 1, ADown, [512]] # 7-P5/32
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 8
- [-1, 1, SPPELAN, [512, 256]] # 9

head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 12

- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]] # 15 (P3/8-small)

- [-1, 1, ADown, [256]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 18 (P4/16-medium)

- [-1, 1, ADown, [512]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]] # 21 (P5/32-large)

- [[15, 18, 21], 1, Detect, [nc]] # DDetect(P3, P4, P5)

Running Training

Now start the training procedure:

from ultralytics import YOLO
model = YOLO("yolov9c.yaml")
model.train(data="train.yaml", epochs=3)

If you see no errors, then the first layer should have one input channel:

from  n    params  module                           arguments
-1 1 704 ultralytics.nn.modules.conv.Conv [1, 64, 3, 2]
-1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]

If the argument is [1, x, y, z] in the first Conv layer, it works as expected.

Further Reading

If you want to learn more about programming and, specifically, machine learning, see the following course:

Note: If you use my links to order, I’ll get a small kickback.

--

--